In which database is it better to store every minute values of cryptocurrencies (+1500 records per minute)?

Ted70212018-04-26 17:03:24

Database

Ted7021, 2018-04-26 17:03:24

The system collects data from the CoinMarketCap API for further analysis. So far, just collecting data and displaying graphs.
What to choose as a storage if 1500 records are added every minute, that is, more than 2 million per day?
We plan to store in MySQL, but I'm sure that there is a solution more suitable for these tasks.
So I need advice from an experienced person in the database and in BigData.
Thank you!

Answer the question

In order to leave comments, you need to log in

7 answer(s)

Barhis, 2018-04-26
@Ted7021

For example, influxDB. Or any other time series DB. It is more common for all kinds of metrics and monitoring, but if your task requires binding to a timestamp (for example, for visualizing charts), it will fit perfectly.

Roman Mirilaczvili, 2018-04-26
@2ord

ClickHouse, InfluxDB.

Maxim Timofeev, 2018-04-26
@webinar

It all depends on the structure, data types. Operations that you plan to carry out. their frequencies.
1500 records per minute is not a terrible figure for mysql. Question what records. If the key is bigInt,
then theoretically 4,294,967,295 records will fit into the database. And this is data for 5.5 years at your frequency. But again - the number of lines says little.
I think that you will run into hard drive volume earlier than database resources.

little brother, 2018-04-27
@little brother

1500 rows/minute can be easily obtained on any database, if the insertion is performed not as a separate transaction, but in batches of several pieces, postponing the write to the database. Even with SQLite, you can get 100K rows per second per write .
The fastest way would be to write values sequentially to separate files for each currency, without a date that can be calculated from the position of the value in the file.
If you don’t drive too hard, then it’s enough just to place a table in an index, see Clustered Index (in PostgreSQL and MySQL) or Index-Organized Tables (in Oracle).
You can also perform a micro-optimization: if it is known that the data arrives at a minute interval, then store not the time (date = 7 bytes) or unix-epoh (4 bytes), but the measurement number.

vanyamba-electronics, 2018-04-27
@vanyamba-electronics

Create a disk in RAM, and place the base on it. To not access the hard drive.

asd111, 2018-04-27
@asd111

clickhouse. It is just created for this kind of data and their analysis. Yandex Metrica works on it.
Clickhouse is optimized for large transactions of 400-500 thousand rows per second.
It compresses large amounts of data well, so the database will take up less disk space than in other DBMSs and searches well in millions of rows - many times faster than MySQL, etc. RDBMS.
He also plays well.
And also one of the main advantages - the query language is similar to SQL with minimal changes.
https://youtu.be/Ac2C2G2g8Cg

Artemy, 2018-04-29
@MetaAbstract

cassandra.apache.org