Removing Duplicate Records in MongoDB

M

mr null2014-04-12 00:29:01

MongoDB

mr null, 2014-04-12 00:29:01

Please help.
There are 50 million records in MongoDB, there is a unique field, but no uniqueness index is set. How do I remove duplicate records at once, the separator with a command or some solution.
I use RockMongo, there is a way to remove duplicates, but on such volumes it is critical. It is necessary to somehow turn this directly to Monge.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

S

Sergey, 2014-04-12
Protko @Fesor

stackoverflow.com/questions/13190370/how-to-remove...

M

mr null, 2014-04-14
@mr_null

Failed. It looks like it's good for up to 1 million records, error
The question is open, read that some copy all the records into a new collection with a unique field. But that's not exactly my problem.
With the same success, I can write a script that makes a selection and searches whether the unique field still occurs, if not, then deletes this entry.

L

lega, 2014-04-16
@lega

This is a one-time task, so you can use any method.
For example you can do
1) mongodump
2) drop collections
3) do uniq index
4) mongorestore

D

Dmitry, 2014-12-14
@promsoft

Yes, when creating a clean one, it will restore only unique ones. See discussion
You can also remove duplicates with a script, but backing up will be faster. I tried 10 million. In Python it was like this

from pymongo import MongoClient

connection = MongoClient('localhost', 27017)

db = connection.mydb

table = db.mytable
for doc in table.find():
  idx = doc['_id']
  qw = doc['qw']
  table.remove({'qw' : qw, '_id' : {'$ne':idx}})

And it went very slowly (there is no index)
But the removal by aggregation (like this) took less than a minute

db.table.aggregate([{$group:{'_id':'$qw'}}, {$out:'newtable'}], {allowDiskUse:true})