A
A
Alexander Tsymbal2020-10-07 19:54:02
PHP
Alexander Tsymbal, 2020-10-07 19:54:02

PHP Simple HTML DOM and Cyrillic incompatible?

Good evening, colleagues. Help!

Recently, on one of the sites (on the ill-fated nic.ru), where the Simple HTML DOM class is used, Cyrillic has ceased to be parsed. Although before everything was ok. I understand that something could change in the server settings, but now it's not about that.

There is a piece of html-code that "eats" Simple HTML DOM, and if it contains Cyrillic, then the parsing ends on the very first tag in which it is found.

A banal example:

$gt_text_volume = "<p>latin</p><p>кириллица</p><h3>latin 3</h3>";//собственно, кусок кода
//...(подключили Simple HTML DOM)
$html = str_get_html($gt_text_volume);//загоняем его в парсер
$tags = $html->find('*');//ищем все теги
foreach ($tags as $key => $tag) {//перебираем их
  echo "\r\n".$tag->innertext;//пытаемся вывести содержимое
}
We get the following result:
latin
кириллица
That is, the parser reached the second tag with the Cyrillic alphabet, and the enumeration ended on it.

If I replace the original piece of code with the following (i.e. remove all Cyrillic from it)
$gt_text_volume = "<p>latin</p><p>latin 2</p><h3>latin 3</h3>";
That result is correct. The following is output:
latin 
latin 2 
latin 3


The file is in UTF-8, the site works in UTF-8.

Can, who faced? I googled the whole Internet (so it seems to me), I found questions with the same problem. But there are no answers. HELP! I beg you tearfully!

UPD: Solution found. For correct operation, you need to make sure that mbstring.func_overload = 0.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
Nadim Zakirov, 2020-10-07
@zkrvndm

At the top of your PHP file, place:

<?php

// Устанавливаем тип документа и его кодировку:
header('Content-Type: text/html; charset=utf-8');

// Включаем показ ошибок:

ini_set('error_reporting', E_ALL);
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);

// Далее ваш код

And try again. If there are errors, the text of the errors here.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question