How to organize storage and sampling of such data?

D

deleted-mifki2012-10-18 14:13:34

SQL

deleted-mifki, 2012-10-18 14:13:34

(I'm talking about a desktop application)
There are some objects with a date, a few flags, and a set of labels. The user fixes the filter by tags, and it can be very complex, like "(tag1 and tag2) or ((tag3 or tag4) and not tag5)". After that, for visualization, there are a large number of samples by date interval and flags, but, accordingly, only within the set filter. By the way, the data is always sorted by date. What would be the best way to organize this?

Thoughts:
1. Regular SQL database and don't bother. But the selection by such a filter will be very complex and slow.
2. Store in non-normalized form - a list of tags for each object. Then we quickly make a selection by date / flags, then programmatically filter by tags. But, it turns out, there are no longer any indexes by tags.
3. Store a sorted list of object keys for each tag, select lists of objects by filter and merge them together if necessary.
4. After selecting in any way using the filter, create a temporary database with indexes for sorting and flags and work with it further. Of the minuses - the need to constantly update it when changing the main database.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

A

AlexeyParhomenko, 2012-10-18
@AlexeyParhomenko

Break the problem into components and solve as you go. What do you decide?
- search for keys by tag?
- fast sampling?
- search for keys by flag?
1. Find the algorithm for searching keys by tag for your task as quickly as possible. This is not the point, it will be sql or nosql. We don't know how much data there is. Yes, and you can only find it in load tests. The main thing to remember is that searching by primary key and by data in RAM is always effective, so if this is the id of your tag, you can quickly find the necessary record keys.
2. Fast fetch != fast query. During a session, you can get 3-5 short queries on different tables (bases) and it will be much faster. Parse your data and go from smaller to larger in parsing.
3. Determine whether the keys/flags are strings or numbers. Accordingly, based on this, make a decision using which algorithms to find them faster.

V

vaevictus, 2012-10-18
@vaevictus

If there are few tags, try a bitmask, it works very fast

U

Urevic, 2012-10-19
@Urevic

Look towards Lucene or Sphinx, they are able to make such selections very quickly.