Answer the question
In order to leave comments, you need to log in
MongoDB (array comparison, aggregation, large amounts of data)?
Good afternoon. Thinking about the structure of the project. At the entrance we have the mongodb database (I would consider other databases, but I really liked the monga). In general, there are documents in the collection, I will show 3 fields, because the rest are just additional information, not for calculations.
{
id: ....
name: ......
properties: [1, 2, 4, 5, ....] - up to 100 values in the array, digit values.
}
I will take from the database the records I need according to the properties that are stored in an array in the field:
db.collect.find({"properties": {$all:[1,3,5, 100]}})
Everything seems to be great, I can take small bundles, say 50 documents, give them to the client.
But there is a catch, I need not just get a list of documents by properties. And to conduct an analysis of the array of properties for all documents.
I'm thinking of using two queries to the database:
1. db.collect.find({"properties": {$all:[1,3,5, 100]}}) - I get the pack of docks I need, by their properties. I will limit the choice to 50 documents.
2. Here is the whole snag, I need to get an estimate of all the arrays of the documents at the output, namely to find out what values how many times they were found in different documents, by type
1 - it was found in the properties field 350 times
2 - it was found in the properties field 100 times
I don’t understand yet , how to do such an operation, I dig in the direction of aggregation, but I'm not sure that I can achieve the result I need.
There is another point, 2 request, it should not have restrictions in the selection, that is, if we send
db.collect.find({"properties": {$all:[1]}}) we can get 10-20,000 or more documents in which we need to find out which values from the properties field are duplicated in all documents and how many.
Advise how you can solve such a problem, is it worth digging further into aggregation or thinking about something else?
UPD:
I solved the problem, everything turned out to be simple:
db.collect.aggregate(
{$match: {parameters: {$all: [4,2]}}},
{$unwind:{path:"$parameters"}},
{$project:{parameters: true, count: {$add: [1]}}},
{$group: {_id: "$parameters", dublicate:{$sum:"$count" } }}
)
Answer the question
In order to leave comments, you need to log in
1 - occurred in the properties field 350 times
2 - occurred in the properties field 100 times
db.collect.find({"properties": 1})
In any case, Aggregation will simplify your task a little.
For example, you have 100000 documents with [1,2,3], you do match: {$all:[1,2,3]}, then group by 1 for example. And you are left with one document. Here's how to calculate then in an optimized way that I don't see one document, for example, right off the bat. Of course, you can go through the outputs built into the forEach mongo after group and make db.collect.count, but this is not beautiful, although it solves the problem.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question