V
V
Vadim9972014-10-23 22:04:44
PHP
Vadim997, 2014-10-23 22:04:44

How can I quickly parse more than 1000 images from a website?

It is necessary to parse more than 1000 images from the site. At the moment I'm using simple html dom, which can't parse the site at all. Tell me how you can do this, if not simple html dom, then maybe some other parser.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
A
Andrew, 2014-10-23
@ntzch

I advise the PHPQuery library, it does not have such glitches as simple html dom (I tried both tm and themes, but I liked phpquery).
Links to lessons:
habrahabr.ru/post/69149
i-novice.net/parsim-sajty-s-phpquery
Recently, I just parsed pictures with this library and it did a very good job
. In order to save a specific picture, you need to use libraries to find links to pictures, I searched the page and put all found links in an array, example code:

$model_page_url = file_get_contents($page);  //Получаем всю страницу
  $model_page = phpQuery::newDocument($model_page_url); //Создаём объект страницы библиотекой
  $images_link = $model_page->find('img'); //Ищем все теги img
  foreach ($images_link as $image_link) {
    $images[] = pq($image_link)->attr('src'); //В цикле помещаем ссылку на картинку в массив
  }

Then something like this:
foreach($images as $image){
        $image_name = basename($image); //Определяем имя и расширение картинки
        if(!file_exists('img/'.$image_name)){ //Проверяем нет ли такой картинки
          file_put_contents('img/'.$image_name, file_get_contents($image)); //через file_get_contents($image) получаем картинку по ссылке и file_put_contents кладём её в нужную нам папку
        }else{
          continue;
        }
      }

The entire sample image parsing process

A
Andrey Ezhgurov, 2014-10-23
@eandr_67

MetaProducts Offline Explorer

H
Hazrat Hajikerimov, 2014-10-24
@hazratgs

SimpleHTMLDOM is an excellent library, very easy to use, the principle of operation is very similar to jQuery or CSS selectors.
Below is the code demonstrating the download of images from the merlion distributor's website:

<?
$simple = file_get_html('http://merlion.com/catalog/product/966656');
foreach ($simple->find('div.ad-thumbs .ad-thumb-list li a') as $el){
    echo $el->href.'<br>';
}

Result:
http://img.merlion.ru/items/966656_v01_m.jpg
http://img.merlion.ru/items/966656_v02_m.jpg
http://img.merlion.ru/items/966656_v03_m.jpg
http://img.merlion.ru/items/966656_v04_m.jpg
http://img.merlion.ru/items/966656_v05_m.jpg

I use a library for parsing 24/7 (round the clock) pictures and product descriptions from various sites, more than 50 thousand products, it copes.

V
Vadim997, 2014-10-25
@Vadim997

Maybe there is some other solution?

D
DimaX, 2019-02-07
@DimaX

If you need one-time, it may be easier to use a ready-made bicycle - an image parser than to invent your own :)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question