Answer the question
In order to leave comments, you need to log in
Is there a way to remove extra end tags when parsing?
I'm doing the parsing of one site, I'm watching a lot of closing tags, </div>
because of which my layout also crashes.
Tried like this
$content = preg_replace("/<\/?div[^>]*\>/i", "", $content);
it does not work ... Maybe someone came across?
Answer the question
In order to leave comments, you need to log in
You need an html markup filter.
With the right settings, htmlpurifier will do.
Or you can parse through DOMDocument and get the contents of the body without tags
$url = 'http://yandex.ru';
$result = file_get_contents($url);
$dom = new \DOMDocument();
libxml_use_internal_errors(true);
/* По-умолчанию loadHTML использует iso-8859-1, поэтому явно указываем преобразование */
$dom->loadHTML(mb_convert_encoding($result, 'HTML-ENTITIES', 'UTF-8'));
libxml_use_internal_errors(false);
$bodyContent = $dom->getElementsByTagName('body')[0]->textContent;
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question