M
M
msimrial2015-10-17 13:40:16
PHP
msimrial, 2015-10-17 13:40:16

How to parse Yandex catalog by pictures?

I tried to lick the Yandex directory from the picture, but alas , it gave out
na.captcha.yandex.net/image?key=c1gsjVSbLQMvTeM033...
add. joxi.ru/MAj1G4eTWVN72e

require_once 'simple_html_dom.php';
set_time_limit(0);

function getIMageYandex($name){
    $name =  rawurlencode($name);
    $url = "https://market.yandex.ru/search.xml?text={$name}&from=reach-search-snippet";
    echo $url."<br>";

    $yandex_data = file_get_html($url);
    foreach($yandex_data->find('script,link,comment') as $tmp)$tmp->outertext = '';
    echo 1 ."<br>";
    $res_url = $yandex_data->find("a.snippet-card__header-link")->attr;
    print_r($yandex_data->find("img.image"));
    preg_match('/^\/product\//',$res_url->attr['href'], $mathes);
    print_r($mathes);

    if(count( $res_url )){
        echo 2 ."<br>";
        $yandex_product = file_get_html("https://market.yandex.ru" . $res_url->attr['href']);
        echo "https://market.yandex.ru" . $res_url->attr['href'];
    }

}
getIMageYandex("Hankook Optimo K715 175/70 R13 82T");

Can someone tell me how to get pictures?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Maxim Timofeev, 2015-10-17
@msimrial

It is necessary to make requests with different headers, with different ip and make a timeout between them. This is parsing protection. Yandex market is the most nauseating option for this venture.
Once I parsed it using the contentdownloader program - it's easier than 'simple_html_dom.php'.
And Yandex Market also has an API. By connecting, which can receive data.
I would parse through the software to fill the base, and then I would finish off what I don’t have with parsing, through 'simple_html_dom.php', while keeping the result. Then there will be fewer requests to Yandex and will not be banned.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question