H
H
hunter_outlaw2020-02-27 20:38:11
MongoDB
hunter_outlaw, 2020-02-27 20:38:11

What is the correct way to copy a large amount of data into your MongoDb?

Hello!
There is an API that gives the settlements in json format. There are a lot of them, more than 15,000, they give out page by page, 150 pieces each.
You need to copy the database to yourself in MongoDb after making changes to the objects.
From time to time you need to update your database. Accordingly, you need to copy yourself in compliance with the condition: if the settlement is already in the database, update it, if not, create it.

Questions:
1. How to correctly implement copying so as not to put the database server on the number of requests?
2. Is it possible to do mass copying (adding / updating) based on the ID of the settlement?
3. How to correctly pause between requests so as not to kill the server?

Now I did it like this:

const updateCities = async function(page) {
            const limit = 150;
            const require = {
                limit: limit,
                Page: page
            }
            return api.address.getSettlements(require).then((json) => {
                // Перебираем все города на странице
                    if(json.data.length !== 0) {
                        for (var obj of json.data) {
                            saveCity(obj);
                        }

                        let log = `Page: ${page}, Results: ${json.data.length} \n`;
                        console.log(log);

                        return log;
                    }else{
                        return;
                    }

            }).catch((errors) => {
                if (Array.isArray(errors)) {
                    errors.forEach((error) => console.log(`[${ error.code || '-' }] ${ error.en || error.uk || error.ru || error.message }`));
                }

                res.type('json');
                res.send(errors);
                res.status(200).send();
            });
        }

        // Вызываем функцию постраничной выборки городов 
        // Создавая между ними ${delay} секунд задержки
            let delay = 2000;
            let page = 0;
            let returnMsg = '';
            let timerId = setTimeout(async function request() {
                let updateCitiesReturn = await updateCities(page);
                returnMsg += updateCitiesReturn;
                page++;

                if(updateCitiesReturn){
                    timerId = setTimeout(request, delay);
                }else{
                    res.type('json');
                    res.send(returnMsg);
                    res.status(200).send();
                }
            }, delay);

        // Сохраняем город
           async function saveCity(obj) {
               // Добавляем/обновляем город в БД
                   const cities = await loadCities();
                   await cities.updateOne({ _id: obj.Ref }, {$set: obj}, {upsert: true});
           }


As a result, at some point, the database stops responding due to a large number of inserts / updates, even despite the pauses between requests. Regarding pauses, this is a separate issue)
I suspect that this is not the most competent decision, so I seek advice

Answer the question

In order to leave comments, you need to log in

3 answer(s)
H
hzzzzl, 2020-02-27
@hunter_outlaw

for (var obj of json.data) {
       saveCity(obj);
}

...

await cities.updateOne({ _id: obj.Ref }, {$set: obj}, {upsert: true});

есть же updateMany, зачем для каждой записи вызывать updateOne
https://docs.mongodb.com/manual/reference/method/d...
UPD
наверно лучше подойдет bulkWrite, куда вкинешь массив на 150 элементов за раз
https://docs.mongodb.com/manual/reference/method/d...

D
Dimonchik, 2020-02-27
@dimonchik2013

сервер, похоже, никакой
впиши каждый запрос в txt файл, потом в один поток сымпорти
хз что там за настройки, что он ложится от 15к каких-то

E
emp1re, 2020-02-28
@emp1re

initializeUnorderedBulkOp

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question