T
T
Teraxis2016-07-09 14:55:17
PHP
Teraxis, 2016-07-09 14:55:17

Why does the parser not see innertext on two pages of the same type?

I am running a parser for two similar pages court.gov.ua/sud1820 and court.gov.ua/sud0828. The goal is to extract court contacts. The markup and encoding of the pages is the same. When processing sud1820, I get what I want


41200
smt Yampil
blvd. Yuvileyny, bud. 8/2

And when processing the sud0828 page, the parser does not see the innertext.
require_once __DIR__ . '/parser/simple_html_dom.php';
$data = file_get_contents('http://court.gov.ua/sud0828');
$data = mb_convert_encoding($data, 'utf-8', 'windows-1251');
$data = str_get_html_2($data);
if ($data->innertext != '') {
      $table = $data->find('table.menur1');
      if($table){
     for($i=0;$i<count($table);$i++){
      CourtFullAddress = strip_tags($table[$i]->find('tr', 1)->children(1));
      list($CourtPostCode, $CourtCity, $street, $build, $section, $section2, $section3) = explode(",", $CourtFullAddress);
            $CourtStreet = $street.', буд. '.$build.', '.$section.', '.$section2.', '.$section3;
       print $CourtPostCode.'<br/>';
       print $CourtCity.'<br/>';
       print $CourtStreet.'<br/><br/>';
      }
}

Tried with curl_init, same result.
What could be the problem?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Maxim Alekhin, 2016-07-09
@Teraxis

Old answer:
New answer:
I went with a bang with both pages:

$data = file_get_contents("http://ymp.su.court.gov.ua/sud1820");
preg_match('/<table cellpadding=0 cellspacing=0 class=menur1>.+?<\/table>/s', $data, $matches);
print_r($matches);

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question