Answer the question
In order to leave comments, you need to log in
In which database is it better to store every minute values of cryptocurrencies (+1500 records per minute)?
The system collects data from the CoinMarketCap API for further analysis. So far, just collecting data and displaying graphs.
What to choose as a storage if 1500 records are added every minute, that is, more than 2 million per day?
We plan to store in MySQL, but I'm sure that there is a solution more suitable for these tasks.
So I need advice from an experienced person in the database and in BigData.
Thank you!
Answer the question
In order to leave comments, you need to log in
For example, influxDB. Or any other time series DB. It is more common for all kinds of metrics and monitoring, but if your task requires binding to a timestamp (for example, for visualizing charts), it will fit perfectly.
It all depends on the structure, data types. Operations that you plan to carry out. their frequencies.
1500 records per minute is not a terrible figure for mysql. Question what records. If the key is bigInt,
then theoretically 4,294,967,295 records will fit into the database. And this is data for 5.5 years at your frequency. But again - the number of lines says little.
I think that you will run into hard drive volume earlier than database resources.
1500 rows/minute can be easily obtained on any database, if the insertion is performed not as a separate transaction, but in batches of several pieces, postponing the write to the database. Even with SQLite, you can get 100K rows per second per write .
The fastest way would be to write values sequentially to separate files for each currency, without a date that can be calculated from the position of the value in the file.
If you don’t drive too hard, then it’s enough just to place a table in an index, see Clustered Index (in PostgreSQL and MySQL) or Index-Organized Tables (in Oracle).
You can also perform a micro-optimization: if it is known that the data arrives at a minute interval, then store not the time (date = 7 bytes) or unix-epoh (4 bytes), but the measurement number.
Create a disk in RAM, and place the base on it. To not access the hard drive.
clickhouse. It is just created for this kind of data and their analysis. Yandex Metrica works on it.
Clickhouse is optimized for large transactions of 400-500 thousand rows per second.
It compresses large amounts of data well, so the database will take up less disk space than in other DBMSs and searches well in millions of rows - many times faster than MySQL, etc. RDBMS.
He also plays well.
And also one of the main advantages - the query language is similar to SQL with minimal changes.
https://youtu.be/Ac2C2G2g8Cg
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question