Answer the question
In order to leave comments, you need to log in
Recommend a simple regular expression
Hello!
The situation is as follows:
1) there is a $get_page variable containing the source code of the site page.
2) in this code, links with the following structure are regularly repeated (only the text of the links changes):
<h3 class="t_i_h3">
<a title="Продаю BMW 735i в Ростове-на-Дону" href="/rostov-na-donu/avtomobili_s_probegom/prodayu_bmw_735i_89296613" name="89296613"> Продаю BMW 735i</a>
</h3>
<h3 class="t_i_h3">
this class always has this class, the contents of the title, href, name change, and the text of the link itself, obviously.
Answer the question
In order to leave comments, you need to log in
Hmm, I found the right example for you (according to the site regexp.ru). I want to check myself. What am I doing wrong?
Test script:
$get_page = file_get_contents('http://www.avito.ru/rostov-na-donu/avtomobili_s_probegom');
preg_match_all("|<h3\s+class=\«t_i_h3\»>(.+?)|isU", $get_page,$result);
echo '<br/><strong>Результат:</strong> <pre>'.var_export($result[1],true).'</pre>';
Gives an empty array...
And I propose to forget about regular expressions (in this case) and use a more suitable and convenient tool.
PHP Simple HTML DOM Parser
<?php
require('simple_html_dom.php');
// Create DOM from string
$html = str_get_html('<html><body><h3 class="t_i_h3">
<a title="Продаю BMW 735i в Ростове-на-Дону" href="/rostov-na-donu/avtomobili_s_probegom/prodayu_bmw_735i_89296613" name="89296613"> Продаю BMW 735i</a>
</h3></body></html>');
// Find all links
foreach($html->find('h3.t_i_h3 a') as $element)
echo $element->title;
|<h3\s+class=\"t_i_h3\">(.+?)|isU
Something like this. In general, they would use XPath And they would not suffer. Parsing the DOM with regular expressions is not always convenient.
'/<h3 class="t_i_h3"><a title="([0-9a-zA-Z-_\/]+)" href="([0-9a-zA-Z-_\/]+)" name="([0-9a-zA-Z-_\/]+)">([0-9a-zA-Z-_\/]+)<\/a><\/h3>/'
This is not an option at all, but for that it is suitable. I don’t know how to make search not only a-zA-Z, but also a-zA-Z.
Let it be long and nasty, but it works. :)
preg_match_all("/<h3 class=\"t_i_h3\">(.*)<\/h3>/isU",$get_page,result);
regexr.com?32s0a
Output three variables title href and text content +)
<?php
for($get_page ="",$i=0;$i<10;$i++)
$get_page .= "
<h3 class=\"t_i_h3\">
<a title=\"".md5(mt_rand(1,1000))."\" href=\"".md5(mt_rand(1,1000))."\" name=\"".md5(mt_rand(1,1000))."\"> ".md5(mt_rand(1,1000))."</a>
</h3>
" . md5(mt_rand(1,1000));
preg_match_all("~".
"\s*".
"\s*<h3.*?t_i_h3.?>".
"\s*<a\s*title\=\"(.*?)\"\s*href\=\"(.*?)\"\s*name\=\"(.*?)\"\s*>(.*?)</a>".
"\s*</h3>".
"\s*".
"\s*".
"~msi", $get_page, $result, PREG_SET_ORDER );
print_r($result);
?>
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question