Recommend a simple regular expression

Sergey Nozdrin2012-11-19 18:25:02

PHP

Sergey Nozdrin, 2012-11-19 18:25:02

Hello!

The situation is as follows:
1) there is a $get_page variable containing the source code of the site page.
2) in this code, links with the following structure are regularly repeated (only the text of the links changes):

<h3 class="t_i_h3">
<a title="Продаю BMW 735i в Ростове-на-Дону" href="/rostov-na-donu/avtomobili_s_probegom/prodayu_bmw_735i_89296613" name="89296613"> Продаю BMW 735i</a>
</h3>

Question.
What regular expression for the php function preg_match_all("", $get_page, $result) should be written so that the $result array contains all links with this structure from the page?
Thanks for the help!

PS Just in case, <h3 class="t_i_h3">this class always has this class, the contents of the title, href, name change, and the text of the link itself, obviously.

Answer the question

In order to leave comments, you need to log in

7 answer(s)

Sergey Nozdrin, 2012-11-19
@light204

Hmm, I found the right example for you (according to the site regexp.ru). I want to check myself. What am I doing wrong?
Test script:
$get_page = file_get_contents('http://www.avito.ru/rostov-na-donu/avtomobili_s_probegom'); preg_match_all("|<h3\s+class=\«t_i_h3\»>(.+?)|isU", $get_page,$result); echo '<br/><strong>Результат:</strong> <pre>'.var_export($result[1],true).'</pre>';
Gives an empty array...

avalak, 2012-11-19
@avalak

And I propose to forget about regular expressions (in this case) and use a more suitable and convenient tool.
PHP Simple HTML DOM Parser

<?php

require('simple_html_dom.php');

// Create DOM from string
$html = str_get_html('<html><body><h3 class="t_i_h3">
<a title="Продаю BMW 735i в Ростове-на-Дону" href="/rostov-na-donu/avtomobili_s_probegom/prodayu_bmw_735i_89296613" name="89296613"> Продаю BMW 735i</a>
</h3></body></html>');

// Find all links
foreach($html->find('h3.t_i_h3 a') as $element)
  echo $element->title;

Sergey, 2012-11-19
Protko @Fesor

|<h3\s+class=\"t_i_h3\">(.+?)|isU
Something like this. In general, they would use XPath And they would not suffer. Parsing the DOM with regular expressions is not always convenient.

Artur Bordenyuk, 2012-11-19
@HighQuality

'/<h3 class="t_i_h3"><a title="([0-9a-zA-Z-_\/]+)" href="([0-9a-zA-Z-_\/]+)" name="([0-9a-zA-Z-_\/]+)">([0-9a-zA-Z-_\/]+)<\/a><\/h3>/'
This is not an option at all, but for that it is suitable. I don’t know how to make search not only a-zA-Z, but also a-zA-Z.
Let it be long and nasty, but it works. :)

Jonh Doe, 2012-11-19
@CodeByZen

preg_match_all("/<h3 class=\"t_i_h3\">(.*)<\/h3>/isU",$get_page,result);

max_rip, 2012-11-19
@max_rip

regexr.com?32s0a
Output three variables title href and text content +)

softm, 2012-11-23
@softm

<?php
   
  for($get_page ="",$i=0;$i<10;$i++)
    
     $get_page  .= "                                                                                                                              
     <h3 class=\"t_i_h3\">                                                                                                                        
     <a title=\"".md5(mt_rand(1,1000))."\" href=\"".md5(mt_rand(1,1000))."\" name=\"".md5(mt_rand(1,1000))."\"> ".md5(mt_rand(1,1000))."</a>      
     </h3>                                                                                                                                        
                                                                                                                                                  
     " . md5(mt_rand(1,1000));

     preg_match_all("~".
     

     "\s*".
     "\s*<h3.*?t_i_h3.?>".
     "\s*<a\s*title\=\"(.*?)\"\s*href\=\"(.*?)\"\s*name\=\"(.*?)\"\s*>(.*?)</a>".
     "\s*</h3>".
     "\s*".
     "\s*".
     
     
     "~msi", $get_page, $result, PREG_SET_ORDER );


     print_r($result);

?>