Which database to choose for large volumes?

H

Hugs2016-05-18 19:34:30

big data

Hugs, 2016-05-18 19:34:30

Good day.
There are large amounts of data, ~40-45GB per day, 25-30k lines per second, recording is continuous.
The final volume can be 50-70TB.
Data format:
int timestamp
int value1
int value2
int value3
Samples basically look like timestamp > date_1 && timestamp < date_2 && data == value*
What database would you recommend? What response can you expect?
It will be a very nice bonus if the database can compress data.
I will add the second option, what is the compression method with the so-called free file seek.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

L

lega, 2016-05-18
@Hugs

Fold in files by the hour (for example) - new hour - new file. Pack next.
You can allocate 2 bytes to a timestamp (because within an hour). See if value can be reduced.
Even if there are 16 bytes per entry, then a modern HDD (150Mb / s) will be able to save ~ 9 million records per second (it will cope with your 30k)
All that remains is to make a tool that will retrieve data according to your conditions.
Files can be stored on a disk, in a file database, or in GridFS, which will shard them across the cluster.

E

evnuh, 2016-05-19
@evnuh

Look at sophia.systems , Tarantool uses it as one of the engines.

S

spotifi, 2016-05-19
@spotifi

InfluxDB is specialized for just such a task.
Yandex Elliptics (currently easily compiled only under Ubuntu 14.04 and the corresponding generation of Debian) is not a database, but a distributed DHT storage. But it can scale itself and replicate and recover. Your business will only connect new servers to it (or disks to servers).

S

Sergic, 2016-05-25
@Sergic

You need to deploy a Hadup cluster and design an architecture based on the delivery of messages with a database intermediary, and store all your terabytes in gzip logs or on a cluster in hive. In short, this question is not for a toaster. You need a freelance platform for devops, because you have no experience with such questions.