Answer the question
In order to leave comments, you need to log in
PHP Simple HTML DOM Parser. Why can't I get the item?
Good day!
First time writing a parser.
You need to parse such a Page of oils, or rather the information that is in the table.
I googled it and decided to use the PHP Simple HTML DOM Parser.
Partially succeeded. I can not understand, only how can I get the elements that are shown in the screenshot:
My code:
<?php
include 'simple_html_dom.php';
$link = 'http://lubematch.shell.com/ru/ru/equipment/100_2_8i_avant_001755';
$data = file_get_html($link);
$result = array();
foreach($data->find('td.application') as $a){
$result['application'][] = $a->plaintext;
}
foreach($data->find('td.recommendation') as $a){
$result['recommendation'][] = $a->plaintext;
}
foreach($data->find('td.capacity') as $a){
$result['capacity'][] = $a->plaintext;
}
echo "<pre>";
print_r($result);
echo "</pre>";
?>
Array
(
[application] => Array
(
[0] => Двигатель (Б (бензиновый))
[1] => Механическая трансмиссия
[2] => Автоматическая трансмиссия
[3] => Дифференциал
[4] => Охлаждающая жидкость
[5] => Модели с автотрансмиссией, дифференциал
[6] => Тормозная жидкость
[7] => Колесные подшипники
[8] => Усилитель рулевого управления
)
[recommendation] => Array
(
[0] => Helix Ultra 0W-40
[1] => Refer To Owners Handbook
[2] => Spirax S2 ATF AX
[3] => От коробки передач
[4] => Refer To Owners Handbook
[5] => Spirax S5 ATE 75W-90
[6] => Refer To Technical
[7] => Gadus S3 V220C 2
[8] => (b)
)
[capacity] => Array
(
[0] => 5.0
[1] => (a)
[2] => (a)
[3] =>
[4] => 11.0
[5] => 1.0
[6] =>
[7] =>
[8] =>
)
)
Answer the question
In order to leave comments, you need to log in
Use XPath , Luke.
<?php
// [1- Скачиваем файл
// Создаем поток
$opts = array(
'http' => array(
'method' => 'GET',
'timeout' => 10,
),
);
$context = stream_context_create($opts);
// Открываем файл с помощью установленных выше HTTP-заголовков
$page_content = file_get_contents('http://lubematch.shell.com/ru/ru/equipment/100_2_8i_avant_001755', false, $context);
// -1]
// [2- Парсим данные
// [3- Строим DOM
// по сути - отключаем вывод ошибок валидации
libxml_use_internal_errors(true);
$page_dom = new \DOMDocument();
$page_dom->strictErrorChecking = false;
$page_dom->preserveWhiteSpace = false;
$page_dom->validateOnParse = true;
$page_dom = new \DOMDocument();
// [4- loadHTML не дает использовать utf-8, делаем хаком http://php.net/manual/en/domdocument.loadhtml.php#95251
$page_dom->loadHTML('<?xml encoding="UTF-8">' . $page_content);
foreach ($page_dom->childNodes as $node) {
if ($node->nodeType == XML_PI_NODE) {
$page_dom->removeChild($node);
}
}
$page_dom->encoding = 'UTF-8';
// -4]
$page_xpath = new \DOMXPath($page_dom);
// -3]
// Вытаскиваем Standard
$param_1 = $page_xpath->query('//table[@id="recommendation"]//tr[2]/th')->item(0)->nodeValue;
// Вытаскиваем Spirax S4 ATF HDX
$param_2 = $page_xpath->query('//table[@id="recommendation"]//tr[5]/td[1]')->item(0)->nodeValue;
// -2]
var_dump($param_1, $param_2);
If you look at the source code of the page, it will become clear why the marked text does not fall into the selection.
For example, you are looking for a td with the class recommendation , but not all tds in the third column have this class. For example <td>Spirax S4 ATF HDX</td>
, this class does not exist here. Also, you don’t take data at all from the column in which the headers lie <th class="tiername tiername">Standard</th>
, where do you get them from? =)
If I were you, I would parse using some other principle. What structure do you want to end up with?
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question