How to specify multiple pairs of conditions in a list query?

S

SmAl272021-01-23 23:34:06

MongoDB

SmAl27, 2021-01-23 23:34:06

There are documents like this
{"username" : "user1", "tags" : ["tag1", "tag2", "tag3", "tag4"]}
{"username" : "user2", "tags" : [ "tag3", "tag2", "tag5", "tag6"]}
{"username" : "user1", "tags" : ["tag4", "tag5", "tag2", "tag1"]}
{" username" : "user2", "tags" : ["tag1", "tag2", "tag3", "tag4"]}
{"username" : "user3", "tags" : ["tag3", "tag7" , "tag8", "tag1"]}
{"username" : "user4", "tags" : ["tag8", "tag5", "tag3", "tag2"]}
{"username" : "user1", "tags" : ["tag9", "tag6", "tag4", "tag5"]}
{"username " : "user2", "tags" : ["tag1", "tag2", "tag7", "tag8"]}

I need to put several pairs of conditions into one request:
username with the name user1 and with the tag tag4
AND]} I need to shove several pairs of conditions into one request: username with the name user1 and with the tag tag4 AND]} I need to shove several pairs of conditions into one request: username with the name user1 and with the tag tag4 AND
username named user2 and tag tag1
AND
username named user2 and tag tag3

Now doing it via $or
$or: [
["username" : "user1", tags: "tag4"],
["username" : "user2" , tags: "tag1"],
["username" : "user2", tags: "tag3"],
]

Everything works
But slows down)
It happens that there are many pairs of conditions.
Documents themselves 5 000 000

What index should be done? Separately for each field? Or a compound "username + tags"

Or how to do it more correctly?
Create a separate table with a "hash" of type username. tags and a reference field to this document?
In the documents themselves, make a new array field with this "hash"?
I heard that mongo has a hashed index, but so far I have not found a clear description.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

V

Vladimir, 2021-01-24
@SmAl27

About hashed indexes
According to the documentation , MongoDB does not support hashed indexes for arrays.
Hashed indexes are needed in order to more evenly distribute data that are located close to each other. They are often used when sharding collections (when one collection is divided into several parts that are stored on different servers). In this case, requests to select this data will not always go to the same server. But if you have queries like поле1 > значение1 && поле1 < значение2, then the index will not be used (because in this case, the monge will have to generate all possible intermediate values and calculate hashes for them, which is impossible).
But judging by the documentation, hashed indexes can also be used for embedded documents (I haven’t done this myself), but I suspect that in this case the entire embedded document must be passed in the request so that the hash is correctly calculated for it.
About indexes in general
In general, the task of an index is to leave as few documents as possible after the initial search for subsequent enumeration, so it is recommended to build an index on fields that are guaranteed to participate in each query.
Let's say you only have a field indexed in your collection usernameand you execute your query like

$or: [
["username" : "user1", tags: "tag4"],
["username" : "user2", tags: "tag1"],
["username" : "user2", tags: "tag3"],
]

usernamein this case, MongoDB will very quickly select all documents that have either user1, or , using the index user2, and then it will start iterating through the found documents and checking them for matches to the rest of the query parameters.
It turns out, it is better to do a compound index username + tags?
Not always. When using a field of type in the index , a separate entry in the index Arraywill be created for each element of the array . If you have few documents in which the field has the same value, but at the same time there can be many values in the array, and the values \u200b\u200bin inside the same array can be repeated, then using a composite index will lead to
This also implies another limitation: in a composite index, only one of the fields can be an array, otherwise, records would have to be created in the index for each document N ・ M, where N and M are the sizes of the arrays that are involved in the construction of the index.
A separate field with a hash
In this case, every time you edit a document, you will need to download the full document from the database, update it, manually calculate a new hash and save the document back. This means you won't be able to use queries that only update individual fields, such as direct array editing .
Separate collection with hashes
Transactionality in MongoDB is only at the level of a single document. And this means that you yourself will have to monitor the consistency of the data in both collections, which can turn into another headache: what will you do if the document has been edited, but the information in the associated collection has not been updated for some reason (for example, the connection with the database was interrupted or the application crashed due to lack of memory)? Therefore, personally, in the general case, I would not recommend such an approach.
So what to do?
The simplest and most effective thing you can do is to launch a mongo locally, generate realistic data in it, play around with different index settings and measure the speed of query execution.
Additionally, you can use the explain methodto view the query execution statistics.