A
A
Anuar Mendubaev2017-03-04 16:53:31
PHP
Anuar Mendubaev, 2017-03-04 16:53:31

Parse all links, traverse them, parse links, traverse them and get a block with text. Where is the mistake?

Good day, gentlemen!
There was a problem parsing product descriptions.
There is a site optnow.ru/catalog. First you need to parse all links of the category, then go through the categories, parse all links to products (there will be no problems with pagination, because the entire list of products is available at ?page=0), go through all the products and parse the block ('.description div').
I use Simple Html Dom

include 'simple_html_dom.php';
  $site = 'http://optnow.ru/catalog';
  $data = file_get_html($site);
  $catalogLink = array();
  if(!empty($data)) {
    foreach($data->find('div.cat-name a') as $catalog) {
      $catalogLink['url'] = $catalog->href;
      $urls[] = $catalogLink;
    }
    foreach($urls as $url => $k) {
      foreach($k as $n) {
        $catalogLink = 'http://optnow.ru/' . $n . '?page=0';
        $productData = file_get_html($catalogLink);
        $productLink['url'] = $productData->find('.link-pv-name')->href;
        $productUrls[] = $productLink;
      }
    }

Sergey and Vasily told me that my $href is not an object, so I did this
foreach($productUrls as $productUrl => $hrefs) {
      foreach($hrefs as $href) {
        $link = new simple_html_dom();
        $hrefLink = 'http://optnow.ru/' . $href;
        echo $hrefLink;
        $linkData = $link->load($hrefLink);
        $productDesc = $linkData->find('.description div p');
        print_r($linkData);
        echo '<pre>';
        print_r($productDesc);
        echo '</pre>';
      }
    }
  }

As a result, I swear I get such an array
http://optnow.ru/simple_html_dom Object ( [root] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] =>  и т.д

What is my mistake, what should I do? I struggle with it the third day. there is another version of the parser, there is a little less foreach

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Vasily Pupkin, 2017-03-05
@Frunky

You are inattentive. Links on the product page also need to be sorted out ...

<?php
  include 'simple_html_dom.php';
  $site = 'http://optnow.ru/catalog';
  $data = file_get_html($site);
  $catalogLink = array();
  if(!empty($data)) {
    foreach($data->find('div.cat-name a') as $catalog) {
      $catalogLink['url'] = $catalog->href;
      $urls[] = $catalogLink;
    }
    foreach($urls as $url => $k) {
      foreach($k as $n) {
        $catalogLink = 'http://optnow.ru/' . $n . '?page=0';
        $productData = file_get_html($catalogLink);
// смотрим отсюда
        foreach($productData->find('.link-pv-name') as $link) {
            $productLink['url'] = $link->href;
            $productUrls[] = $productLink;
        }
      }
    }
  }

R
Ruslan, 2017-03-04
@mitrm

If using phpQuery and docs.guzzlephp.org/en/latest

$client = new \GuzzleHttp\Client();
        $res = $client->request('GET', 'http://optnow.ru/catalog');
        // получаем данные между открывающим и закрывающим тегами body
        $body = $res->getBody();
        $document = \PhpQuery\phpQuery::newDocumentHTML($body);
        $a = $document->find('div.cat-name')->find('a');
        $i = 0;
        $data = [];
        foreach ($a as $a_item){//цикл по ссылкам
            $data[$i]['href'] = \PhpQuery\pq($a_item)->attr('href');
            $data[$i]['text'] = trim(\PhpQuery\pq($a_item)->text());
            $i++;
        }
        print_r($data);

result
Array
(
    [0] => Array
        (
            [href] => categories/istochniki-pitaniya
            [text] => Источники питания
        )

    [1] => Array
        (
            [href] => categories/avtoaksessuary
            [text] => Автомобильные аксессуары
        )

    [2] => Array
        (
            [href] => categories/selfie
            [text] => Selfie (селфи, оборудование для автопортретов)
        )

    [3] => Array
        (
            [href] => categories/audio-aksessuary
            [text] => Аудио аксессуары
        )
....

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question