InfluxDB, Prometheus, OpenTSDB. What to choose for storing and analyzing metrics?

Y

yiicoder2015-09-08 11:51:56

big data

yiicoder, 2015-09-08 11:51:56

The maximum precision for timeseries is 1h.
A lot of filters (about 20-30), and a lot of different values (~1000 is typed in total for these filters).
Now all this works on MongoDB, but the number of filters is growing and the number of values in them, so any pre-calculations are no longer very helpful.
Raw data base ~ 2TB, precalculated (values are summed up with the same filters) ~ 100GB.
Now it takes about 30 seconds for each filter request, it is extremely inconvenient for analysts to work with such delays.
Realtime is not needed (for most metrics, the relevance of "yesterday" is enough).
The question is more to those who use one of the databases - what are the problems? How fast does it work?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

T

Timur Batyrshin, 2015-10-25
@yiicoder

I didn’t really understand the task, I’ll try to explain the difference, as I understand it from my experience:
OpenTSDB:
* works on top of HBase / Hadoop, for tests you can run in standalone mode, but it will work _extremely slowly
* timeseries like timestamp, metricname=val, (tag =val)+ , can only store numbers (there is a batch mode if you need to write several batches)
* the amount of data scales well due to HBase
* the community reports slowdowns with a very large number (tens of thousands+) of series identifiers -- this is the name of the series + combination of tags
* writing and sampling speed is good: in HBase data is partitioned hourly and only those series are read for those periods that are needed
* for scaling, we put additional OpenTSDB nodes behind the proxy (if we run into aggregation), or HBase nodes (if we run into IO
) tags (e.g. average "os.cpu" for all metrics that have "role=webserver" tag)
* The query language itself is a bit edgy * https://bosun.org/
has recently appeared , which sits in front of OpenTSDB and allows some other then operations to do * Upstream development is quite leisurely InfluxDB: * put in test mode very easily (one binary)
* still unstable -- 2 HTTP APIs have changed over the past year and about five variants of the binary format on disk -- this is my biggest complaint about it
* timeseries like db, timestamp, metricname=val, (tag=val)+, i.e. . You can logically group different data. It seems that it was possible to store text values.
* SQL-like query language
* The guys from Coub said that it downloads well for writing, but slows down for reading (I don’t know about which version, however)
* They have a lot of connectors to different input formats (graphite, opentsdb, collectd, etc.) .p.)
* develops quite dynamically
Of the well-known TSDBs, there is also Graphite:
* an old well-known variant
* python with modules, so it is more difficult to install than influxdb, but easier than hadup
* RRD base, i.e. can only store data "for the last year, last month and last hour" with its own accuracy for each period
* due to this, the data takes up a well-predictable and constant disk space
* a huge amount of documentation and all sorts of bindings on the Internet
* timestamp type series, metric=val -- tags, etc. no. therefore, for example, the same series for different hosts will have to be grouped under different names
.
* with default storage, with a large number of series, it starts to rest on the disk
* scales unimportantly (I don’t know the details)
* various repositories appear periodically from the community, which improve the situation with speed and scaling
Prometheus has not seen.
I also heard something about druid.io, but I don’t know anything about it either.