I
I
Iter2016-03-03 13:16:16
PHP
Iter, 2016-03-03 13:16:16

How to parse HTML with unlimited nesting depth?

Hello!
There was a problem parsing the html structure. Here is an example of the code that I managed to write:

$a = '<span style="font-size:18px"><span style="color:#FF0000">Apollo 11 was the spaceflight that <span style="font-family:courier new,courier,monospace">landed the first humans</span></span><span style="font-family:courier new,courier,monospace">, Americans </span>Neil Armstrong and Buzz Aldrin, on the Moon on July 20, 1969, at 20:18 UTC. Armstrong became the first to step onto the lunar surface 6 hours later on July 21 at 02:56 UTC.</span>';
$dom = new DOMDocument;
$dom->loadHTML($a);
echo '<pre>';
foreach($dom->documentElement->childNodes as $item){
  var_dump('0 -> '.$item->nodeName);
  if($item->nodeName != '#text'){
    foreach($item->childNodes as $item2){
      var_dump('1 -> '. $item2->nodeName);
      if($item2->nodeName != '#text'){
        foreach($item2->childNodes as $item3){
          var_dump('2 -> '. $item3->nodeName);
          if($item3->nodeName != '#text'){
            foreach($item3->childNodes as $item4){
              var_dump('3 -> '. $item4->nodeName);
              if($item4->nodeName != '#text'){
                foreach($item4->childNodes as $item5){
                  var_dump('4 -> '. $item5->nodeName);
                }
              }
            }
          }
        }
      }
    }
  }
}

codepad.org/tg1jrGNc
The problem is that tags can be nested very deep. Writing a bunch of foreachs is a very bad idea. How can I solve the problem in a normal way?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
J
jacksparrow, 2016-03-03
@jacksparrow

Use the ready-made
Simple HTML DOM library as an example

T
Turar Abu, 2016-03-03
@kemply

Use a recursive function

$a = '<span style="font-size:18px"><span style="color:#FF0000">Apollo 11 was the spaceflight that <span style="font-family:courier new,courier,monospace">landed the first humans</span></span><span style="font-family:courier new,courier,monospace">, Americans </span>Neil Armstrong and Buzz Aldrin, on the Moon on July 20, 1969, at 20:18 UTC. Armstrong became the first to step onto the lunar surface 6 hours later on July 21 at 02:56 UTC.</span>';
$dom = new DOMDocument;
$dom->loadHTML($a);
echo '<pre>';

function recurse($a, $i){
  
  var_dump("$i -> " . $a->nodeName );
  if( $a->nodeName != '#text' )
    foreach( $a->childNodes as $t )
      recurse($t, $i+1);
  
}

foreach( $dom->documentElement->childNodes as $i )
  recurse($i, 0);

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question