A
A
Araik Oganyan2017-01-19 12:17:20
PHP
Araik Oganyan, 2017-01-19 12:17:20

How to parse invalid HTML?

Please tell me how to parse invalid HTML?
Previously, I always used Simple HTML DOM, the result \ speed was fine, but it does not work with invalid HTML - it goes into recursion.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
DrunkMaster, 2017-01-19
@DrunkMaster

Регулярками очевидно же

A
alexey_komyakov, 2017-01-19
@aleksey_komyakov

Однозначно сначала Tidy. Отлично исправляет весь невалид

I
Ilya, 2017-01-19
@glebovgin

Tidy. And there is no need to look for alternatives.
Here is one of my use cases for example.

$options = 	array("indent" => false, 
        "output-xml" => true, 
        "clean" => true,
        "drop-proprietary-attributes" => true,
        "drop-font-tags" => true,
        "drop-empty-paras" => true,
        "hide-comments" => true,
        "join-classes" => true,
        "join-styles" => true,
        "show-body-only" => false); 
              
$tidy = new tidy();
$str = $tidy->parseString($page, $options, 'utf8'); // $page содержит невалидный html
$tidy->cleanRepair();
echo $tidy; // валидный html

With a list of options, I advise you to play around on your own.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question