MongoDB (array comparison, aggregation, large amounts of data)?

L

livemirsi2016-03-01 21:38:38

MongoDB

livemirsi, 2016-03-01 21:38:38

Good afternoon. Thinking about the structure of the project. At the entrance we have the mongodb database (I would consider other databases, but I really liked the monga). In general, there are documents in the collection, I will show 3 fields, because the rest are just additional information, not for calculations.
{
id: ....
name: ......
properties: [1, 2, 4, 5, ....] - up to 100 values in the array, digit values.
}
I will take from the database the records I need according to the properties that are stored in an array in the field:
db.collect.find({"properties": {$all:[1,3,5, 100]}})
Everything seems to be great, I can take small bundles, say 50 documents, give them to the client.
But there is a catch, I need not just get a list of documents by properties. And to conduct an analysis of the array of properties for all documents.
I'm thinking of using two queries to the database:
1. db.collect.find({"properties": {$all:[1,3,5, 100]}}) - I get the pack of docks I need, by their properties. I will limit the choice to 50 documents.
2. Here is the whole snag, I need to get an estimate of all the arrays of the documents at the output, namely to find out what values how many times they were found in different documents, by type
1 - it was found in the properties field 350 times
2 - it was found in the properties field 100 times
I don’t understand yet , how to do such an operation, I dig in the direction of aggregation, but I'm not sure that I can achieve the result I need.
There is another point, 2 request, it should not have restrictions in the selection, that is, if we send
db.collect.find({"properties": {$all:[1]}}) we can get 10-20,000 or more documents in which we need to find out which values from the properties field are duplicated in all documents and how many.
Advise how you can solve such a problem, is it worth digging further into aggregation or thinking about something else?
UPD:
I solved the problem, everything turned out to be simple:

db.collect.aggregate(
{$match: {parameters: {$all: [4,2]}}},
{$unwind:{path:"$parameters"}}, 
{$project:{parameters: true, count: {$add: [1]}}}, 
{$group: {_id: "$parameters", dublicate:{$sum:"$count" } }}
)

match - make the selection of documents necessary
unwind - expands the array of parameters
project: save only the parameter field, add the count field to it, to make it easier to calculate
group - group and add count
As a result, we have the number of repetitions of array elements in the selections we need.
True, the elements for which the selection was initially limited also fall into it, but this is not scary, they can be removed in the processing of the application itself.
I'm not sure about the performance of this solution, I'll test it.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

L

lega, 2016-03-01
@lega

1 - occurred in the properties field 350 times
2 - occurred in the properties field 100 times

To do this, you can keep the cache: {_id: 1, count: 350}
When there is only one element, it's better like this:
db.collect.find({"properties": 1})

P

pomeo, 2016-03-02
@pomeo

In any case, Aggregation will simplify your task a little.
For example, you have 100000 documents with [1,2,3], you do match: {$all:[1,2,3]}, then group by 1 for example. And you are left with one document. Here's how to calculate then in an optimized way that I don't see one document, for example, right off the bat. Of course, you can go through the outputs built into the forEach mongo after group and make db.collect.count, but this is not beautiful, although it solves the problem.