Answer the question
In order to leave comments, you need to log in
How to encode a string?
I am doing a SEO site parser. I take title, h1, description, keywords and other parameters from certain pages of the site. The parser works correctly on 50 different sites.
At one point I came across an unusual site on bitrix. Some strange magic happens on it. All parameters (h1, description, keywords) on this site are parsed normally, but the title is returned as shorthand like: ������ ���� �������� � ���� ����� ���: ����� ����, �������� � ������ ����� ��� ���� � �������� -�������� ������.
There should be Cyrillic characters here. description and keywords on the site are also in Cyrillic, but they are parsed in an adequate form. I can't understand how a piece of the code of the rendered page can be in a different encoding and why browsers understand it normally.
For the parser I use the dom_parser library.
The accepted title itself is taken into the $title variable. I need to determine the encoding of this variable and recode it correctly. Googled. Not one provided script for changing the encoding did not fit.
What to do with this disaster?
Return header example:
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 29 Aug 2016 05:58:48 GMT
Content-Type: text/html; charset=windows-1251
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.5.30
P3P: policyref="/bitrix/p3p.xml", CP="NON DSP COR CUR ADM DEV PSA PSD OUR UNR BUS UNI COM NAV INT DEM STA"
X-Powered-CMS: Bitrix Site Manager (be4ce0ee34669c98d89788a28b50c007)
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: PHPSESSID=48d1bc70ecc0e47c747bb097e21fcddd; path=/; HttpOnly
Answer the question
In order to leave comments, you need to log in
Of course, I would like to look at the site, probably there the title is indicated somehow non-standard, perhaps in the form of a Ӓ
sequence.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question