How to choose a repository of 3 trillion events?

B

Blowspirit2017-02-06 10:36:12

Java

Blowspirit, 2017-02-06 10:36:12

You need to choose a storage that will receive a large volume of the same type of events (up to 3 million per second).
A storage depth of 1 month is approximately 3 trillion events.
The selection of events will occur using filters by fields on average once per second.
Accordingly, the storage should be able to scale horizontally by 100-1000 nodes, be a reliable and proven solution, be resistant to node failures, make quick selections according to various criteria with the ability to sort, support the java client.

Reply

Answer the question

In order to leave comments, you need to log in

7 answer(s)

D

Dimonchik, 2017-02-06
@dimonchik2013

Yandex Clickhouse (only for Yandex, but just for the task)
Aerospike
can still be started with DynamoDB, everything is ready there, just pay
only from 3 trillion and 5-10 seconds I’m not very sure, one way or another you will have to preprocess something

T

Tsimur_S, 2017-02-06
@Tsimur_S

Tarantool and AeroSpike? Or perhaps it is worth looking towards the time series database?
https://www.influxdata.com/influxdb-vs-cassandra-b...
Maybe cassandra can handle an insane amount of servers, but in general, more than a million records per second is currently poorly implemented.

L

lega, 2017-02-06
@lega

The speed of ssd is up to 550Mb / sec, if the events are 20b each, then you can pour ~ 27 million events per second into files (one channel is not enough to load)

The selection of events will occur using filters by fields on average once per second.

Pour into the "length" of the filters and there will be norms.

I

index0h, 2017-02-07
@index0h

KDB+

E

ELazin, 2017-02-15
@ELazin

Akumuli can record 4.5 million events per second on a single m3.2xlarge instance (if the events are represented as a combination of a set of tags, a timestamp and a float).

M

Maxim Timofeev, 2017-02-06
@webinar

https://www.oracle.com/database/solutions/index.html

P

Peter, 2017-02-06
@petermzg

Azure Datalake