How to parse a div from a page?

A

andreystrelkov2016-06-16 00:29:49

PHP

andreystrelkov, 2016-06-16 00:29:49

Good evening, I'm trying to parse the main text of the news from the lenta.ru website page
, it doesn't work, what am I doing wrong

$html = file_get_contents('https://lenta.ru/news/2016/06/02/trol/');
if (preg_match('#<span class="b-text">(\d+?)</span>#', $html, $matches)) {
  $price = $matches[1]; 
}
echo $price;

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

E

Egor Nevedov, 2016-06-16
@Sanitar88

$matches[0]?
well, yes, and if you parse the main text, \d+? - it's not text at all

I

Immortal_pony, 2016-06-16
@Immortal_pony

Firstly, the Feed has RSS, it is undesirable to be tied to the layout:
At the time of the launch of the new site, the following RSS is available:
/rss/news - news
/rss/top7 - the latest and most important news
/rss/last24 - top news for the last 24 hours
/rss/articles - all articles
/rss/columns - columns
/ rss/news/russia — news of the Russia rubric; after the slash, you can write the English name of any rubric - pay attention to their url (for example, /rss/news/world - this is all the news of the "World" rubric)
/rss/articles/russia - all the articles of the "Russia" rubric; headings management is similar
/rss/photo — all galleries
/rss/photo/russia — all galleries of the Russia heading; rubric management similar
Secondly, regarding XML parsing, using regular expressions for this is also not a good idea. It's generally best to never use them if possible. You can parse XML using SimpleXML or Nokogiri . And specifically for RSS, there are several libraries .
Example:

libxml_use_internal_errors(true); // Не провоцировать генерацию ошибок в php из-за ошибок разбора html 

$html = file_get_contents("https://lenta.ru/news/2016/06/02/trol/");
$page = new domDocument();
$page->loadHTML("<?xml version='1.0' encoding='UTF-8'?>" . $html); // Явное указание кодировки полученных данных

$article = "";
$domXpath = new DOMXPath($page);
$newDom = new DOMDocument();
$newDom->formatOutput = true;

$filtered = $domXpath->query("//div[@itemprop='articleBody']");
$i = 0;
while ($item = $filtered->item($i++)) {
    $node = $newDom->importNode($item, true);
    $newDom->appendChild($node);
}

$article = $newDom->saveHTML();
libxml_clear_errors(); // Очищение буфера ошибок.

L

LBC, 2016-06-16
@VSKut

I advise you to take something like this: simplehtmldom.sourceforge.net