S
S
sohav2017-10-09 11:29:56
PHP
sohav, 2017-10-09 11:29:56

How to regex cut content between tags given the dom structure?

There is this html code:

<div class="article">
  <p class="title">Test!</p>
  <div>Content content</div>
  <p>test test</p>
  <div class="test">test</div>
  <p>test</p>
</div>
<?= rand(0,100); ?>
<div class="article">
  <p class="title">Test1!</p>
  <div>Content content1</div>
  <p>test test1</p>
  <div class="test">test1</div>
  <p>test1</p>
</div>

Please tell me a regular expression that would select the content inside the div.article tags.
I try /<div class="article">(.*?)<\/div>/igs, but everything is killed on the first occurrence </div>, you can set the U flag, but also not the same result :-(
I want to do it with regular expressions, and not with an xml parser
Here is a link to the sandbox - https://regex101.com/r/6HDlxK/1 /
Thank you!

Answer the question

In order to leave comments, you need to log in

3 answer(s)
N
ns5d, 2017-10-09
@sohav

1. find all div.article: regex101.com/r/z6RRM7/1
2. remove tags "<[^>]*>" => ""

V
vyrkmod, 2017-10-09
@vyrkmod

Most likely it's not the regular season, but the use of preg_match() instead of preg_match_all(). And yes, I can't resist posting this link .

P
parserpro, 2017-12-04
@parserpro

In my mind, this is done by a SAX parser, not a regular expression.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question