I
I
Igor2014-01-25 14:00:34
PHP
Igor, 2014-01-25 14:00:34

PHP Simple HTML DOM Parser. Why can't I get the item?

Good day!
First time writing a parser.
You need to parse such a Page of oils, or rather the information that is in the table.
I googled it and decided to use the PHP Simple HTML DOM Parser.
Partially succeeded. I can not understand, only how can I get the elements that are shown in the screenshot:
53d73848de20795c17ecadca7a7118e5.gif
My code:

<?php
include 'simple_html_dom.php';

$link = 'http://lubematch.shell.com/ru/ru/equipment/100_2_8i_avant_001755';

   $data = file_get_html($link);

   $result = array();

        foreach($data->find('td.application') as $a){

          $result['application'][] =  $a->plaintext;

        }

        foreach($data->find('td.recommendation') as $a){

            $result['recommendation'][] =  $a->plaintext;
        }

        foreach($data->find('td.capacity') as $a){

            $result['capacity'][] =  $a->plaintext;
        }

    

   echo "<pre>";
    print_r($result);
  echo "</pre>";
?>

I get the answer:
Array
(
    [application] => Array
        (
            [0] => Двигатель (Б (бензиновый))
            [1] => Механическая трансмиссия
            [2] => Автоматическая трансмиссия
            [3] => Дифференциал
            [4] => Охлаждающая жидкость
            [5] => Модели с автотрансмиссией, дифференциал
            [6] => Тормозная жидкость
            [7] => Колесные подшипники
            [8] => Усилитель рулевого управления
        )

    [recommendation] => Array
        (
            [0] =>              Helix Ultra 0W-40                                     
            [1] =>              Refer To Owners Handbook                                     
            [2] =>              Spirax S2 ATF AX                                     
            [3] =>              От коробки передач                                     
            [4] =>              Refer To Owners Handbook                                     
            [5] =>              Spirax S5 ATE 75W-90                                     
            [6] =>              Refer To Technical                                     
            [7] =>              Gadus S3 V220C 2                                     
            [8] =>                                        (b)           
        )

    [capacity] => Array
        (
            [0] =>              5.0                        
            [1] =>                           (a)           
            [2] =>                           (a)           
            [3] =>                                      
            [4] =>              11.0                        
            [5] =>              1.0                        
            [6] =>                                      
            [7] =>                                      
            [8] =>                                      
        )

)

I will be grateful in advance for your help.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexey Sundukov, 2014-01-25
@GansikUA

Use XPath , Luke.

<?php

// [1- Скачиваем файл
// Создаем поток
$opts = array(
  'http' => array(
    'method'  => 'GET',
    'timeout' => 10,
  ),
);

$context = stream_context_create($opts);

// Открываем файл с помощью установленных выше HTTP-заголовков
$page_content = file_get_contents('http://lubematch.shell.com/ru/ru/equipment/100_2_8i_avant_001755', false, $context);
// -1]

// [2- Парсим данные
// [3- Строим DOM
// по сути - отключаем вывод ошибок валидации
libxml_use_internal_errors(true);
$page_dom = new \DOMDocument();

$page_dom->strictErrorChecking = false;
$page_dom->preserveWhiteSpace  = false;
$page_dom->validateOnParse     = true;

$page_dom = new \DOMDocument();

// [4- loadHTML не дает использовать utf-8, делаем хаком http://php.net/manual/en/domdocument.loadhtml.php#95251
$page_dom->loadHTML('<?xml encoding="UTF-8">' . $page_content);

foreach ($page_dom->childNodes as $node) {
  if ($node->nodeType == XML_PI_NODE) {
    $page_dom->removeChild($node);
  }
}
$page_dom->encoding = 'UTF-8';
// -4]

$page_xpath = new \DOMXPath($page_dom);
// -3]

// Вытаскиваем Standard
$param_1 = $page_xpath->query('//table[@id="recommendation"]//tr[2]/th')->item(0)->nodeValue;
// Вытаскиваем Spirax S4 ATF HDX
$param_2 = $page_xpath->query('//table[@id="recommendation"]//tr[5]/td[1]')->item(0)->nodeValue;
// -2]

var_dump($param_1, $param_2);

I
Igor Deyashkin, 2014-01-25
@Lobotomist

If you look at the source code of the page, it will become clear why the marked text does not fall into the selection.
For example, you are looking for a td with the class recommendation , but not all tds in the third column have this class. For example <td>Spirax S4 ATF HDX</td>, this class does not exist here. Also, you don’t take data at all from the column in which the headers lie <th class="tiername tiername">Standard</th>, where do you get them from? =)
If I were you, I would parse using some other principle. What structure do you want to end up with?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question