How to parse the internal pages of the site?

S

Sergey Erin2020-01-13 10:53:12

PHP

Sergey Erin, 2020-01-13 10:53:12

I use xpath to parse a list of elements, or rather their pictures. But I need images of the original size, and now the page from which the image is being parsed with resizing. The original pictures can be obtained only by clicking on the link to each element of the list. Can you tell me how such a parser should look like? Now like this:

libxml_use_internal_errors(true);
    $url = file_get_contents("https://домен/раздел/);
    /* Новый объект DomDocument */
    $dom = new DomDocument;
    $dom->loadHTML($url);
    /* Новый объект XPath */
    $xpath = new DomXPath($dom);
    /* Селектор элементов */
    $nodes = $xpath->query("//div[@class='bxr-element-container']");
    /* Соединение с базой */
    $mysqli_connect = mysqli_connect(DB_HOSTNAME, DB_USERNAME, DB_PASSWORD, DB_DATABASE) or die("Couldn't connect to bd");
    /* Обновление и помещение значений в базу */
    foreach ($nodes as $i => $node) {
        $title = trim($xpath->query("//div[@class='bxr-element-name']/a", $node)->item($i)->nodeValue);
        $image = 'https://basis-spb.ru' . $xpath->query("//div[@class='bxr-element-image  ']/a/img/@src", $node)->item($i)->value;
        $sql = "INSERT IGNORE INTO oc_materials (`material_name`, `image`) 
                VALUES ('" . $title . "', '" . $image . "')";
        $query = mysqli_query($mysqli_connect, $sql) or die (mysql_error());
    }

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

K

Kirill Alekseev, 2020-01-13
@kspitfire

The original pictures can be obtained only by clicking on the link to each element of the list. Can you tell me how such a parser should look like?

The parser must follow the links to each element of the list using an http client (cURL, for example) and pull pictures from there, obviously.