R
R
rockwell3232020-06-14 16:14:04
JavaScript
rockwell323, 2020-06-14 16:14:04

How to discard duplicates when writing to mongoDB?

Hello, in general, this is the situation, I wrote a code that downloads the csv database every minute, parses it and writes the data I need for the right model to the mongoDB collection, as a result, it writes more than 40k objects at the first write, since the csv database on a third-party site is updated every minute, which then a new one is added to it, you have to download it every minute, parse and write it again and again to the mongoDB collection, because of which its volume grows in arithmetic progression (40k -80k-120k, etc.) and, accordingly, duplicates appear.
The question is, on subsequent writes to mongoDB, how do I discard duplicates that already exist in my collection so that only new objects that don't exist in my collection are written?
I'm trying to compare two databases, an existing one and a new one, the second day of attempts, so far no results have been brought (

Answer the question

In order to leave comments, you need to log in

3 answer(s)
J
juxifo, 2020-06-14
@juxifo

Use update instead of insert.
Maybe you can compare by hashes, but for 40k+ objects, I think this is an unforgivable task for optimization.

R
rockwell323, 2020-06-15
@rockwell323

I figured out with updateOne, everything works, but there is one thing, the database is updated and writes new objects in ~ 6 minutes and the processes are 100% loaded during these 6 minutes. The task was to make it all happen within a minute and without such a colossal load on the processor.

M
MyAngelRem, 2020-06-15
@MyAngelRem

So you haven't tried it?

dbo.collection('collection').findOne({ _id: database_2._id }, function (err, data) {
    if (!data) {
        
    }
});

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question