Answer the question
In order to leave comments, you need to log in
Parse all links, traverse them, parse links, traverse them and get a block with text. Where is the mistake?
Good day, gentlemen!
There was a problem parsing product descriptions.
There is a site optnow.ru/catalog. First you need to parse all links of the category, then go through the categories, parse all links to products (there will be no problems with pagination, because the entire list of products is available at ?page=0), go through all the products and parse the block ('.description div').
I use Simple Html Dom
include 'simple_html_dom.php';
$site = 'http://optnow.ru/catalog';
$data = file_get_html($site);
$catalogLink = array();
if(!empty($data)) {
foreach($data->find('div.cat-name a') as $catalog) {
$catalogLink['url'] = $catalog->href;
$urls[] = $catalogLink;
}
foreach($urls as $url => $k) {
foreach($k as $n) {
$catalogLink = 'http://optnow.ru/' . $n . '?page=0';
$productData = file_get_html($catalogLink);
$productLink['url'] = $productData->find('.link-pv-name')->href;
$productUrls[] = $productLink;
}
}
foreach($productUrls as $productUrl => $hrefs) {
foreach($hrefs as $href) {
$link = new simple_html_dom();
$hrefLink = 'http://optnow.ru/' . $href;
echo $hrefLink;
$linkData = $link->load($hrefLink);
$productDesc = $linkData->find('.description div p');
print_r($linkData);
echo '<pre>';
print_r($productDesc);
echo '</pre>';
}
}
}
http://optnow.ru/simple_html_dom Object ( [root] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => и т.д
Answer the question
In order to leave comments, you need to log in
You are inattentive. Links on the product page also need to be sorted out ...
<?php
include 'simple_html_dom.php';
$site = 'http://optnow.ru/catalog';
$data = file_get_html($site);
$catalogLink = array();
if(!empty($data)) {
foreach($data->find('div.cat-name a') as $catalog) {
$catalogLink['url'] = $catalog->href;
$urls[] = $catalogLink;
}
foreach($urls as $url => $k) {
foreach($k as $n) {
$catalogLink = 'http://optnow.ru/' . $n . '?page=0';
$productData = file_get_html($catalogLink);
// смотрим отсюда
foreach($productData->find('.link-pv-name') as $link) {
$productLink['url'] = $link->href;
$productUrls[] = $productLink;
}
}
}
}
If using phpQuery and docs.guzzlephp.org/en/latest
$client = new \GuzzleHttp\Client();
$res = $client->request('GET', 'http://optnow.ru/catalog');
// получаем данные между открывающим и закрывающим тегами body
$body = $res->getBody();
$document = \PhpQuery\phpQuery::newDocumentHTML($body);
$a = $document->find('div.cat-name')->find('a');
$i = 0;
$data = [];
foreach ($a as $a_item){//цикл по ссылкам
$data[$i]['href'] = \PhpQuery\pq($a_item)->attr('href');
$data[$i]['text'] = trim(\PhpQuery\pq($a_item)->text());
$i++;
}
print_r($data);
Array
(
[0] => Array
(
[href] => categories/istochniki-pitaniya
[text] => Источники питания
)
[1] => Array
(
[href] => categories/avtoaksessuary
[text] => Автомобильные аксессуары
)
[2] => Array
(
[href] => categories/selfie
[text] => Selfie (селфи, оборудование для автопортретов)
)
[3] => Array
(
[href] => categories/audio-aksessuary
[text] => Аудио аксессуары
)
....
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question