D
D
devalone2020-01-15 14:38:32
Database
devalone, 2020-01-15 14:38:32

What DB to use for timeseries data?

Given :
- Several tables like users, stories, comments, where in each field like id, rating, text, is_deleted, etc.
- A lot of data, >145 lams in comments, > 7 lams in stories and more than 2.5 lams in users
- For each column, the table with versions, for example, comments has a rating column and to store previous values ​​there is a comment_rating_versions table of the following form:


item_id uint64
timestamp int64
value int32

There are also versions of text and boolean fields, the same, only the value type is different.
All this is running on postgres.
Task :
make efficient data selections (both in the users, stories, comments tables, and in version tables), build graphs, distributions by days / hours and do it all efficiently.
Problem :
postgres works with such data very slowly, for example, the distribution of all comments by day takes more than 8 hours!
Question :
Does anyone have experience with similar timeseries data and what database can you recommend? Quick googling hints that there is influxdb, clickhouse from Yandex and the like. Or maybe something like pipelinedb for postgres will suit me?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
I
Ivan Shumov, 2020-01-15
@inoise

what is indicated is not timeseries. timeseries databases are stored as key-value storage with small volumes per item. The main application is metrics from IoT devices
, seemingly obvious problems:
- historical data is stored together with relevant data (no one has heard of CQRS or Event Sourcing)
- no one thought about analytics and columnar databases (influxdb is a time series and this is not about your case, clickhouse yes, but it works well only with a large flow of incoming data). You might think, for example, about Cassandra.

I
Ilya, 2020-01-15
@sarapinit

For postgres, timescalesb will do. This is a plugin, its main task is to automate partitioning. Check out the docks https://www.timescale.com/

M
mayton2019, 2020-01-15
@mayton2019

The author very quickly dropped Postgres. But I want to ask if all possibilities have been explored? For example TimescaleDb.

S
Saboteur, 2020-01-16
@saboteur_kiev

influxDB is more suitable for storing numerical metrics than strings, and even more so than comments. She is one of the best at this. And there is no selection by tables, there are not even actual tables with columns, there are metrics that are stored rather in the form of objects. Here to store data on load, traffic, other numbers and sometimes strings - yes.
But if you need table selections, stay within the limits of SQL databases.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question