P
P
Pavel Gogolinsky2014-03-11 00:08:14
PHP
Pavel Gogolinsky, 2014-03-11 00:08:14

How to organize page parsing using PHP Simple HTML DOM?

You need to parse the menu href of the links from the menu. Here is an example


    I am using PHP Simple HTML DOM. I am writing a query
    $simpleHTML = new SimpleHTMLDOM;
    $all = $simpleHTML->file_get_html('http:/site.com');
    foreach ($all->find('.catalog a') as $link) {
    echo $link->href;
    }
    As a result, I get only one href - the last link. Who knows why? What is noticeable, only this link does not have data-attributes. Maybe they have an effect? How to win?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
P
Pavel Gogolinsky, 2014-03-11
@gogolinsky

Yes, located in the subportfolio<li class="portfolio">
036494c0eb90472e9d1b3ab30f807518.jpg

Y
Yuri Morozov, 2014-03-11
@metamorph

You either have a buggy version of the parser, or a buggy wrapper over it.
Took the parser from here: simplehtmldom.sourceforge.net Run
:

<?php
require_once('./simple_html_dom.php');
header('Content-type: text/plain');

$content = '<ul class="catalog">
    <li class="portfolio" data-type="in" data-inc="PGEgaHJlZj0iL2NhdGFsb2cvZHZlcmktbWV6aGtvbW5hdG55eWUvZHZlcmktc2hwb25pcm92YW5ueXllL2VsaXQiPtCh0LXRgNC40Y8g0K3Qu9C40YI8L2E+"><a href="/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/elit">Серия Элит</a></li>
    <li class="portfolio" data-type="in" data-inc="PGEgaHJlZj0iL2NhdGFsb2cvZHZlcmktbWV6aGtvbW5hdG55eWUvZHZlcmktc2hwb25pcm92YW5ueXllL2tvbWZvcnQiPtCh0LXRgNC40Y8g0JrQvtC80YTQvtGA0YI8L2E+"><a href="/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/komfort">Серия Комфорт</a></li>
    <li class="portfolio" data-type="in" data-inc="PGEgaHJlZj0iL2NhdGFsb2cvZHZlcmktbWV6aGtvbW5hdG55eWUvZHZlcmktc2hwb25pcm92YW5ueXllL3N0YW5kYXJ0Ij7QodC10YDQuNGPINCh0YLQsNC90LTQsNGA0YI8L2E+"><a href="/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/standart">Серия Стандарт</a></li>
    <li class="portfolio"><a href="/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/euro">Серия Евро</a></li>
</ul>';

$all = str_get_html($content);
foreach ($all->find('.catalog a') as $link) {
  echo $link->href . "\n";
}

Result:
/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/elit
/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/komfort
/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/standart
/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/euro

P
Pavel Gogolinsky, 2014-03-11
@gogolinsky

Tried the original library. Yes, it works fine with the HTML line, but if you parse the site page, you get the same thing - some tags aare not recognized.
Here is the page www.dveri.com/. You need to parse all links from the left menu that are in <li>the .portfolio class.
The code

require('./simple_html_dom.php');
  $content = file_get_html('http://dveri.com');
  foreach ($content->find('.portfolio a') as $link) {
    echo $link->href . '<br>';
  }

Result
/catalog/dveri-mezhkomnatnyye/dveri-shponirovannyye/euro
/catalog/dveri-mezhkomnatnyye/dveri-ekoshpon/vetro
/catalog/dveri-mezhkomnatnyye/dveri-iz-massiva
/catalog/dveri-mezhkomnatnyye/dveri-emal
/catalog/dveri-mezhkomnatnyye/dveri-pvh
/catalog/dveri-mezhkomnatnyye/dveri-stekljannye
/catalog/dveri-mezhkomnatnyye/dveri-stroitelnyye
/catalog/protivopozharnyye-dveri-lyuki
/catalog/razdvizhnye-dveri
/catalog/arki-mezhkomnatnye
/catalog/furnitura-dlya-dverey/fiksatory
/catalog/furnitura-dlya-dverey/stroitelnya
/catalog/furnitura-dlya-dverey/dovodshiki
/catalog/stenovyye-potolochnyye-paneli

But these are not all links.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question