What to use as a database for searching/aggregating by tags?

M

mclander2017-08-10 12:18:53

NoSQL

mclander, 2017-08-10 12:18:53

There is a task:
- some millions, in the long term, billions of records.
- each entry has from zero to a hundred tags and several significant fields (the set of tags is limited, but slowly expanding)
- it is necessary to quickly find the first (by time/descending id) 10-20 thousand entries by the set of tags (if there are fewer suitable entries, then find everything)
- search speed is very important
- adding speed is not important
- the size of the database with indexes is more important (for the desktop version) than not (otherwise it would be logical to create a record in a relational database with a field for each tag and an index for it - searched would be fast, and the rest is unimportant
. Is there a ready-made system that can be easily installed and configured (moreover, under and under linux and Windows)?
It is clear that you can stir up your blackjack with poetesses relatively easily, but you don’t really want to.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

M

Maxim Fedorov, 2017-08-10
@mclander

ElasticSearch
Discussion Sphinx or ElasticSearch?
Combat experience at 2GIS https://habrahabr.ru/company/2gis/blog/213765/

V

vkdv, 2017-08-10
@vkdv

You can try redis with sets and intersection
, each tag has its own set of records
tag1 - record1,record2,record3,record4,record5
tag2 - record5,record6,record3
tag3 - record1,record3,record5
Then perform the mat operation SINTER tag2 tag2 tag3
The result will be record3, record5
If sorting and limits are important - then you can use ordered lists and the ZINTERSTORE command - but it is less productive

X

xmoonlight, 2017-08-11
@xmoonlight

Everything is not so scary (MSSQL, mysql, postgres - it will do):
1. create a table of SETs of tags with the IDs of the tags themselves and from the ID of the set itself.
2. For each entry when adding - put the desired ID-shnik of the tag set.
3. When selecting by tags, you get the necessary IDs of suitable sets from the set table.
4. For these sets - make a selection from the main table with any necessary filter and sorting.
Thus, you will speed up the search, because. there will be no need to check the tags themselves and refer to other tables for matching (crossing).