K
K
khodos_dmitry2019-01-07 13:55:37
Parsing
khodos_dmitry, 2019-01-07 13:55:37

Why is part of the page parsed normally, and part of it is parsed?

I'm trying to parse one site. Part of the data is normally downloaded from it, but data from some blocks is parsed like this:
5c332fba01601316386491.png
$spravka = iconv("windows-1251", "utf-8", $spravka); After that, there's nothing left at all.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Andrej Gessel, 2019-01-07
@khodos_dmitry

I think I got to the bottom of the problem. Some of the text is written in pure utf8, some is written using numerical HTML encoding of the Unicode character.
Example:
UTF8:Sber
HTML: Сбер=Sber

$doc->loadHTML(mb_convert_encoding($body, 'HTML-ENTITIES', 'UTF-8'));

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question