Answer the question
In order to leave comments, you need to log in
Removing Duplicate Records in MongoDB
Please help.
There are 50 million records in MongoDB, there is a unique field, but no uniqueness index is set. How do I remove duplicate records at once, the separator with a command or some solution.
I use RockMongo, there is a way to remove duplicates, but on such volumes it is critical. It is necessary to somehow turn this directly to Monge.
Answer the question
In order to leave comments, you need to log in
Failed. It looks like it's good for up to 1 million records, error
The question is open, read that some copy all the records into a new collection with a unique field. But that's not exactly my problem.
With the same success, I can write a script that makes a selection and searches whether the unique field still occurs, if not, then deletes this entry.
This is a one-time task, so you can use any method.
For example you can do
1) mongodump
2) drop collections
3) do uniq index
4) mongorestore
Yes, when creating a clean one, it will restore only unique ones. See discussion
You can also remove duplicates with a script, but backing up will be faster. I tried 10 million. In Python it was like this
from pymongo import MongoClient
connection = MongoClient('localhost', 27017)
db = connection.mydb
table = db.mytable
for doc in table.find():
idx = doc['_id']
qw = doc['qw']
table.remove({'qw' : qw, '_id' : {'$ne':idx}})
db.table.aggregate([{$group:{'_id':'$qw'}}, {$out:'newtable'}], {allowDiskUse:true})
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question