A
A
asosonko42015-12-03 16:27:43
PHP
asosonko4, 2015-12-03 16:27:43

Why doesn't Simple HTML DOM find individual objects?

Good afternoon, tell me please ... I ran into the following problem. I'm trying to parse the following page: superdeals.aliexpress.com/en
Need to get div class="pro-msg" which is stored in li with class list-items.
However, it turns out that not all data is displayed on the page. The code looks like this:

require_once 'simple_html_dom.php';

$base = 'http://superdeals.aliexpress.com/en?spm=2114.11010108.21.1.v65LIL';

$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$str = curl_exec($curl);
curl_close($curl);

$html = new simple_html_dom();
$html->load($str);

  $res=$html->find('div.pro-msg', 0)->outertext;
  echo $res;

That is, Simple HTML DOM does not even see individual parts of the page, code execution = an empty page, I put a different div - everything works. If you parse the entire file_get_html page, then naturally not the entire site is displayed. Please tell me how to get around the problem.
Thank you very much in advance!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
E
Eugene, 2015-12-03
@asosonko4

phantomjs
---
Okay, I'll help you, otherwise you'll be parsing ajax with regular expressions all your life :)
1. Create an empty ali folder
2. Download the composer there https://getcomposer.org/composer.phar
3. Create a composer.json file with this content

{
  "require": {
    "jonnyw/php-phantomjs": "3.*",
    "symfony/dom-crawler": "3.*",
    "symfony/css-selector": "3.*"
  },
  "config": {
    "bin-dir": "bin"
  },
  "scripts": {
    "post-install-cmd": [
      "PhantomInstaller\\Installer::installPhantomJS"
    ],
    "post-update-cmd": [
      "PhantomInstaller\\Installer::installPhantomJS"
    ]
  }
}

4. Execute
5. Create an index.php file
<?php

require __DIR__ . '/vendor/autoload.php';

$client = \JonnyW\PhantomJs\Client::getInstance();
$request = $client->getMessageFactory()->createRequest('http://superdeals.aliexpress.com/en?spm=2114.11010108.21.1.v65LIL', 'GET');
$response = $client->getMessageFactory()->createResponse();
$client->send($request, $response);
$html = $response->getContent();

$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
$div = $crawler->filter('div.pro-msg');
if($div) {
    echo $div->first()->text();
}

6. Execute it php index.php and see the result
/usr/local/bin/php /Users/evgenij/projects/untitled/index.php

        Today Only
        
          Boy's Coat
          >  Synthetic leather> Motor jacket style> Available in black and red
          share:

    vk
        pinterest
        facebook
        Twinner
        Google+
        Email
    Sign in and share the website for a chance to get Points, which you can then convert to coupons.

          US $9.74
          
            US $32.48 / piece | 70% Off
          
          
          
          
            0486Left					
          Buy Now
          
        
      
Process finished with exit code 0

R
RomkaChev, 2015-12-03
@RomkaChev

view-source: superdeals.aliexpress.com/en?spm=2114.11010108.21.... - line 1049.
The element you need is inserted into the page using JS. That is why it is not present as a DOM element in the source code.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question