Answer the question
In order to leave comments, you need to log in
Which database to choose for large volumes?
Good day.
There are large amounts of data, ~40-45GB per day, 25-30k lines per second, recording is continuous.
The final volume can be 50-70TB.
Data format:
int timestamp
int value1
int value2
int value3
Samples basically look like timestamp > date_1 && timestamp < date_2 && data == value*
What database would you recommend? What response can you expect?
It will be a very nice bonus if the database can compress data.
I will add the second option, what is the compression method with the so-called free file seek.
Answer the question
In order to leave comments, you need to log in
Fold in files by the hour (for example) - new hour - new file. Pack next.
You can allocate 2 bytes to a timestamp (because within an hour). See if value can be reduced.
Even if there are 16 bytes per entry, then a modern HDD (150Mb / s) will be able to save ~ 9 million records per second (it will cope with your 30k)
All that remains is to make a tool that will retrieve data according to your conditions.
Files can be stored on a disk, in a file database, or in GridFS, which will shard them across the cluster.
InfluxDB is specialized for just such a task.
Yandex Elliptics (currently easily compiled only under Ubuntu 14.04 and the corresponding generation of Debian) is not a database, but a distributed DHT storage. But it can scale itself and replicate and recover. Your business will only connect new servers to it (or disks to servers).
You need to deploy a Hadup cluster and design an architecture based on the delivery of messages with a database intermediary, and store all your terabytes in gzip logs or on a cluster in hive. In short, this question is not for a toaster. You need a freelance platform for devops, because you have no experience with such questions.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question