Answer the question
In order to leave comments, you need to log in
How to scrape data using Symphony DomCrawler?
Hello! I'm trying to figure out parsing on Laravel using Symphony DomCrawler and I'm asking for help to figure it out. Studying the manual, not everything is clear and by googling I came across one article, a site that no longer exists, but partially having gained access through a saved copy of Yandex .
Code example:
/**
* Get content from html.
*
* @param $parser object parser settings
* @param $link string link to html page
*
* @return array with parsing data
* @throws \Exception
*/
public function getContent($parser, $link)
{
// Get html remote text.
$html = file_get_contents($link);
// Create new instance for parser.
$crawler = new Crawler(null, $link);
$crawler->addHtmlContent($html, 'UTF-8');
// Get title text.
$title = $crawler->filter($parser->settings->title)->text();
// If exist settings for teaser.
if (!empty(trim($parser->settings->teaser))) {
$teaser = $crawler->filter($parser->settings->teaser)->text();
}
// Get images from page.
$images = $crawler->filter($parser->settings->image)->each(function (Crawler $node, $i) {
return $node->image()->getUri();
});
// Get body text.
$bodies = $crawler->filter($parser->settings->body)->each(function (Crawler $node, $i) {
return $node->html();
});
$content = [
'link' => $link,
'title' => $title,
'images' => $images,
'teaser' => strip_tags($teaser),
'body' => $body
];
return $content;
}
Answer the question
In order to leave comments, you need to log in
Why do you even need to know what is in the properties of this $parser object?
Just write your selectors and that's it. The most common CSS selectors (well, :contains is also supported).
You tore the method out of the documentation, but forgot about the context. This is just an example. Rewrite in your own way and the problem will disappear by itself.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question