I
I
Ivan2017-08-13 20:34:18
PHP
Ivan, 2017-08-13 20:34:18

How to fix site encoding in response to cURL request?

Please help defeat the encoding.
The problem is the following, in response to curl, a site comes in which the windows-1251 encoding is registered in meta because of this, hieroglyphs are displayed on the site.
This problem was solved with:

$isWinCharset = mb_check_encoding($postResult, "windows-1251");
if ($isWinCharset) {
    $postResult = iconv("windows-1251", "UTF-8", $postResult);
}

Now if meta is set to windows-1251 encoding, the site is displayed correctly.
If meta is set to utf-8 encoding, the site is displayed correctly.
Rejoiced.
But suddenly I found a couple of sites that crashed after adding :
$isWinCharset = mb_check_encoding($postResult, "windows-1251");
if ($isWinCharset) {
    $postResult = iconv("windows-1251", "UTF-8", $postResult);
}

UTF-8 encoding is specified in meta, and the site is in hieroglyphs as a result , an example of such a site: e-qa.ru/autoprodazha
There are not many such sites, but they exist and are very annoying, most sites where meta UTF-8 work correctly. Apparently the encoding of the file itself on this site e-qa.ru/autoprodazha differs from that specified in the meta because of this there is a conflict.
Help me understand and eliminate hieroglyphs on all sites, I tried a bunch of methods and all are mutually exclusive :(

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
stictt, 2019-04-04
@stictt

you select a bunch of data, + do the summation of the field, without grouping, and check the parameters, if garbage is stuffed there or nothing, what will happen to the request? you don't have to do that. This is the second web question today that makes me throw face palm

A
Alexander Lykasov, 2019-04-04
@lykasov-aleksandr

Your SQL queries, for some reason, do not find data in tables and return false instead of the expected result . And in the code this situation is not taken into account in any way.

P
PrAw, 2017-08-14
@9StarRu

Actually, the remote site already tells you everything, why not take into account what it says?
1. We look into the HTTP response headers, we see:
Content-Type: text/html; charset=UTF-8
2. We look at the content of the page, we see:
3. There is one more method to suggest the encoding:
Solution - we look at what they tell us, substitute it as a parameter for iconv, but do not forget the default value just in case.
Solution - if there are a limited number of sites, store your preferred encoding somewhere.
A python snippet that implements automatic decoding based on the response header:

encoding='utf-8' # кодировка по умолчанию
tmp = r.headers.get('Content-Type').split('=') #режем по =, что справа - кодировка
if len(tmp)>1: #если кодировка в заголовке есть - будет 2 элемента
    encoding=tmp[-1] # тогда берём последний
page = r.content.decode(encoding)

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $engine_url );
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
....
//дополнить этим:
curl_setopt($ch, CURLOPT_HEADER, 1);

$tmpResult = curl_exec($ch);

$header_size = curl_getinfo($ch,CURLINFO_HEADER_SIZE);
curl_close($ch);

$tmpHeaders = substr($tmpResult, 0, $header_size);
$postResult = substr($tmpResult, $header_size);

$headers = array();
foreach(explode("\n",$tmpHeaders) as $header)
{
  $tmp = explode(":",trim($header),2);
  if (count($tmp)>1)
  {
    $headers[strtolower($tmp[0])] = trim(strtolower($tmp[1]));
  }
}

$encoding="utf-8"; //default
if (isset($headers['content-type']))
{
  $tmp = explode("=", $headers['content-type']);
  if (count($tmp)>1) $encoding = $tmp[1];
}
if ($encoding != "utf-8") $postResult = iconv($encoding, "UTF-8", $postResult);

Everything. We get an extended response that contains headers. We cut out the headers from it and cut them into an array, plus we get the response body.
We parse the http headers, extract the content-type and extract the encoding from it

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question