S
S
Sergey Erin2020-01-13 10:53:12
PHP
Sergey Erin, 2020-01-13 10:53:12

How to parse the internal pages of the site?

I use xpath to parse a list of elements, or rather their pictures. But I need images of the original size, and now the page from which the image is being parsed with resizing. The original pictures can be obtained only by clicking on the link to each element of the list. Can you tell me how such a parser should look like? Now like this:

libxml_use_internal_errors(true);
    $url = file_get_contents("https://домен/раздел/);
    /* Новый объект DomDocument */
    $dom = new DomDocument;
    $dom->loadHTML($url);
    /* Новый объект XPath */
    $xpath = new DomXPath($dom);
    /* Селектор элементов */
    $nodes = $xpath->query("//div[@class='bxr-element-container']");
    /* Соединение с базой */
    $mysqli_connect = mysqli_connect(DB_HOSTNAME, DB_USERNAME, DB_PASSWORD, DB_DATABASE) or die("Couldn't connect to bd");
    /* Обновление и помещение значений в базу */
    foreach ($nodes as $i => $node) {
        $title = trim($xpath->query("//div[@class='bxr-element-name']/a", $node)->item($i)->nodeValue);
        $image = 'https://basis-spb.ru' . $xpath->query("//div[@class='bxr-element-image  ']/a/img/@src", $node)->item($i)->value;
        $sql = "INSERT IGNORE INTO oc_materials (`material_name`, `image`) 
                VALUES ('" . $title . "', '" . $image . "')";
        $query = mysqli_query($mysqli_connect, $sql) or die (mysql_error());
    }

Answer the question

In order to leave comments, you need to log in

1 answer(s)
K
Kirill Alekseev, 2020-01-13
@kspitfire

The original pictures can be obtained only by clicking on the link to each element of the list. Can you tell me how such a parser should look like?

The parser must follow the links to each element of the list using an http client (cURL, for example) and pull pictures from there, obviously.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question