Answer the question
In order to leave comments, you need to log in
PHP Simple HTML DOM and Cyrillic incompatible?
Good evening, colleagues. Help!
Recently, on one of the sites (on the ill-fated nic.ru), where the Simple HTML DOM class is used, Cyrillic has ceased to be parsed. Although before everything was ok. I understand that something could change in the server settings, but now it's not about that.
There is a piece of html-code that "eats" Simple HTML DOM, and if it contains Cyrillic, then the parsing ends on the very first tag in which it is found.
A banal example:
$gt_text_volume = "<p>latin</p><p>кириллица</p><h3>latin 3</h3>";//собственно, кусок кода
//...(подключили Simple HTML DOM)
$html = str_get_html($gt_text_volume);//загоняем его в парсер
$tags = $html->find('*');//ищем все теги
foreach ($tags as $key => $tag) {//перебираем их
echo "\r\n".$tag->innertext;//пытаемся вывести содержимое
}
We get the following result:latin
кириллица
That is, the parser reached the second tag with the Cyrillic alphabet, and the enumeration ended on it. $gt_text_volume = "<p>latin</p><p>latin 2</p><h3>latin 3</h3>";
That result is correct. The following is output:latin
latin 2
latin 3
Answer the question
In order to leave comments, you need to log in
At the top of your PHP file, place:
<?php
// Устанавливаем тип документа и его кодировку:
header('Content-Type: text/html; charset=utf-8');
// Включаем показ ошибок:
ini_set('error_reporting', E_ALL);
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
// Далее ваш код
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question