I
I
Ivan2018-12-20 06:56:46
PHP
Ivan, 2018-12-20 06:56:46

How not to upload products again if they are already in the database?

I parse the list of products and save the database in sqlite using the following code:

<?php

set_time_limit(0);

include __DIR__ . '/config.php';

include __DIR__ . '/libs/shop.class.php';
include __DIR__ . '/libs/database.class.php';

$banlist = array_filter(array_map('trim', file(__DIR__ . '/banlist.txt')));
$shop = new Shop(__DIR__ . '/auth.cookie');
$db = new DataBase(__DIR__ . '/db.sqlite');

// Проверка авторизации. Если уже авторизован, то идет дальше, нет - авторизуется по логину и паролю

if (!$shop->checkAuth()) {
$shop->auth(USER, PASS);

if (!$shop->checkAuth()) {
    die('Can\'t auth');
}
}

$items = $db->all();

for ($page = 1; $page >= 1; $page--) {
//for ($page = 1; $page <= 1; $page++) {

// Загрузка

$items = $shop->getMarketplaceTrends('Day', $page);
//krsort($items);
krsort($item->id);

foreach ($items as $item) {
    if (!in_array($item->user['id'], $banlist)) {
        if ($db->exists($item, 3)) {
            continue;
        }

        $item->downloadAndResizeImages();

        $db->add(array_merge($item->only(['title', 'description', 'image', 'images', 'wareUrl', 'user', 'user_id', 'rate', 'createdAt', 'price', 'price_total']), [
            //'id' => (3 * 10000000) + $item->id,
            'wareUrl' => trim($item->wareUrl, '/'),
            'id' => $item->id,
            'item_id' => $item->id,
            'type' => 3,
            'user_id' => $item->user['id'],
            'user_fullname' => $item->user['fullName'],
            'hasPromo' => (int)$item->hasPromo,
            'price_total' => $item->price['total'],
            'time' => date("r")
        ]));
    }
sleep(SLEEP_CHECKS);
}

sleep(SLEEP);

// Загрузка

$items = $shop->getMarketplaceTrends('ThreeDays', $page);
//krsort($items);
krsort($item->id);

foreach ($items as $item) {
    if (!in_array($item->user['id'], $banlist)) {
        if ($db->exists($item, 3)) {//4
            continue;
        }

        $item->downloadAndResizeImages();

        $db->add(array_merge($item->only(['title', 'description', 'image', 'images', 'wareUrl', 'user', 'user_id', 'rate', 'createdAt', 'price', 'price_total']), [
            //'id' => (4 * 10000000) + $item->id,
            'wareUrl' => trim($item->wareUrl, '/'),
            'id' => $item->id,
            'item_id' => $item->id,
            'type' => 3,//4
            'user_id' => $item->user['id'],
            'user_fullname' => $item->user['fullName'],
            'hasPromo' => (int)$item->hasPromo,
            'price_total' => $item->price['total'],
            'time' => date("r")
        ]));
    }
sleep(SLEEP_CHECKS);

}

sleep(SLEEP);

// Загрузка

$items = $shop->getMarketplace('leaders', $page);
//krsort($items);
krsort($item->id);

foreach ($items as $item) {
    if (!in_array($item->user['id'], $banlist)) {
        if ($db->exists($item, 3)) {//5
            continue;
        }

        $item->downloadAndResizeImages();

        $db->add(array_merge($item->only(['title', 'description', 'image', 'images', 'wareUrl', 'user', 'user_id', 'rate', 'createdAt', 'price', 'price_total']), [
            //'id' => (5 * 10000000) + $item->id,
            'wareUrl' => trim($item->wareUrl, '/'),
            'id' => $item->id,
            'item_id' => $item->id,
            'type' => 3,//5
            'user_id' => $item->user['id'],
            'user_fullname' => $item->user['fullName'],
            'hasPromo' => (int)$item->hasPromo,
            'price_total' => $item->price['total'],
            //'time' => microtime(true)
            'time' => date("r")
        ]));
    }
sleep(SLEEP_CHECKS);

}

sleep(SLEEP);
}

echo 'Done';

I run the script on the crown and each time all the goods are loaded on a new one.
How to check for previously loaded products, for example, by id or item_id, so that the script first checks for the presence of the product in the database and parses it only if the product is not in the database?
Thank you!

Answer the question

In order to leave comments, you need to log in

1 answer(s)
W
Wentixon, 2018-12-20
@Wentixon

First, you collect links that you will parse. So write down these links in a separate table. Next time when collecting links, actually get links that already exist and exclude them, leave new ones

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question