R
R
romanm942013-12-01 01:11:28
PHP
romanm94, 2013-12-01 01:11:28

How to extract the contents of a tag in PHP using regular expressions?

I ran into the following problem: while parsing a site (HTML DOM Parser), I found myself in an unpleasant situation with extracting information from a tag.
There is an array $el containing the following lines:

<a href="test">TEST1</a>
<span id="info">INFO</span>
<a href="test2">TEST2</a>

I should just output the contents of the tags. Did it like this:
$txt = $el->innertext;
preg_match ( '/<a[^>]+?[^>]+>(.*?)<\/a>/i' , $txt , $matches); 
$info['TEST1:'] = str_replace("TEST1:","",$matches[1]);
preg_match ( '/<a[^>]+?[^>]+>(.*?)<\/a>/i', $txt , $matches); // не знаю как вывести содержимое второго тега <a></a>
$info['TEST2:'] = str_replace("TEST2:","",$matches[1]);
preg_match ( '/<span[^>]+?[^>]+>(.*?)<\/span>/i' , $txt , $matches);
$info['INFO:'] = str_replace("INFO:","",$matches[1]);

It was possible to take out TEST1 and INFO, but unfortunately not TEST2. How can this be done (output the contents of the second tag <a></a>?
There is also an array that contains:
<span class="date">01 декабря 2013 — 02:20</span> // например

How to separate the date (before the dash) and the time (after) using regular expressions?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
E
egor_nullptr, 2013-12-01
@romanm94

$doc = new DomDocument('1.0', 'utf-8');
$doc->loadXML('<body>'.$txt.'</body>');
$xp = new DomXPath($doc);

foreach ($xp->query('//a') as $anode) {
    echo $anode->nodeValue;
};

foreach ($xp->query('//span[@class="date"]') as $date_node) {
    list($date, $time) = explode(' - ', $date_node->nodeValue);
};

A
Alexey, 2013-12-01
@ScorpLeX

strip tags

D
demimurych, 2013-12-01
@demimurych

Why not use ready-made solutions?
for example simplehtmldom.sourceforge.net/ allows you to get everything you need jquery like requests

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question