Answer the question
In order to leave comments, you need to log in
It is very fast to pour 1 million rows per second into the database and read them just as quickly. How best to implement?
It is necessary:
- to pour ~1 million rows into the table (ticker, price, quantity, oper) every second (each row goes as a separate INSERT),
- while SELECTs are performed 100-1000 times per second to select newly arrived rows.
It's important that it doesn't just work.
It is very important:
- to minimize the time from the arrival of TCP packets with data on the network interface to the receipt of a SELECT result with the same data,
- to have some kind of guarantee (possibly with a probability of ~0.995) that the time between SELECT results with the same data will be no more than some sufficiently small ε .
Questions:
Can this be done with MySQL, PostgreSQL, or another classic relational DBMS?
Will there be a profit in time from using columnar DBMS (InfluxDB, ClickHouse, something else)?
Will it be faster to take data from the network interface directly, bypassing the database, and after processing during resource idle time, add it to the database? How much faster will it be?
Answer the question
In order to leave comments, you need to log in
pour into the table (ticker, price, quantity, oper) every second ~1 million rowswhen , from which broker
Of course, the question is in the gland. And a microscope can hammer a nail.
But to the words "I want to quickly insert and quickly read in a stream" I just want to answer "why do you need a database?"
Therefore, I would like to clarify with the author: but apart from the described "insert a million, count a million" - what is supposed to be done with the data? Change them line by line? Search by some key? is it all necessary? If not - I would still recommend not to use a DB.
Here it should be understood that any normal database is [almost] always a double write to disk: you write to the table AND to the database log. That is why a file or Kafka or another MQ will always be faster.
Well, if you still need a database - well, then BULK modes will help you. They are typically used for batch initialization loading. In some databases, they can disable some features or even logging to the transaction log for the duration of their work.
----------------------------
In general, by all indications, in your case, the ideal option would be to write in MQ (RabbitMQ or Kafka or see analogues), and already from it to the database. "Everyone does it", at least in large companies this is a fairly typical solution for tasks like yours. Moreover, the database in this story is needed only if you then need to store and select. If after the first operation you no longer need the data, or you only need a backup, then the database is not needed - write to a file, pack in zip (in an enterprise - throw files into Hadoop in some Parquet format).
IMHO, Clickhouse was undeservedly forgotten if you need to store and use some kind of analytics. But it really does not slow down, the speed is comparable to a simple dump of raw logs to disk. Hot data on an SSD or even in memory can be kept in temporary tables. The main thing is to insert data in batches, but in my opinion, not a single database likes small inserts. If kafka or a rabbit is used for guaranteed delivery, they are natively supported, but it should be borne in mind that an additional layer == additional time drawdowns.
if you need to store conditionally for an hour, then you should look at Aerospike at all, and in general, if you need to keep the same hour in close access, it will be useful. it is undeservedly little mentioned in the Russian-speaking community, but in fact it is head and shoulders above any radishes, even the free version. And he also knows how to permanently store, combined and even with a disk, how to work directly with a block device, without FS
Influx - it’s about something else, these are time series, metrics. Like Prometheus and Victoriametrics
PS: not for the sake of sracha and trolling, but still the Old Believers who offer to store everything in the file system, I want to ask: Gentlemen are good, but you probably still store your projects in the form of New_Folder, New_Folder1, New_Folder2, etc.? All the same, you need to look at the recording algorithms, working with hardware, etc., they change and develop. Yours, and mine, too, 2007 cannot be returned
> Will it be faster to take data from the network interface directly, bypassing the database, and after processing during resource downtime, add them to the database?
Most likely, this is how it should be done.
NOT expert opinion: It seems that a tarantula was written just for such hybrid tasks, the hot part of the database is in memory, the cold part is rolling to disk. IMHO it's your case...
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question