How to write 200 thousand lines per second?

U

un1t2015-12-16 19:09:50

Python

un1t, 2015-12-16 19:09:50

It is necessary to write down 200 thousand requests in one second at a time, that's all, you don't need to write anything further. Those. This is a one-time operation, but very critical to speed. We have a cluster. It is important that if any of the machines dies, the data is not lost.
Bulk insert is not suitable, the data will come from different clients.
Monga 3 claims to hold 300k inserts per second, but it’s not a fact that our hardware will have the same performance, and the guys who worked closely with it said that they wouldn’t count on such a figure for Monga. And this is probably a digit entry locally and not in the cluster.
There is another idea, for example, write it to rabbitmq. But again, the question is whether it will withstand such a figure, and whether everything will be fine there if you make a cluster of rebits in different DCs in terms of reliability and speed.
There was also an idea to write to files, but here the problem is how not to lose files when the machine dies and how to synchronize them.
You can consider some other technologies if the data does not fit.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

I

Igor Alyakimov, 2015-12-16
@kaiten

Look towards Apache Kafka
Super performance queue

S

sim3x, 2015-12-16
@sim3x

There is an understatement in the question
. After all, pushing 200k at a time into the DBMS queue and then inserting it into the database is easy
. Accepting 200mb over gigabit and writing to a textbook on RAID10 is not a problem.

E

epolyak, 2015-12-18
@epolyak

If I were you, I would use queues - for example RabbitMQ or ActiveMQ. send 200k messages to the queue is not a problem? and if you put 100 objects in one message, for example, then you get 1000 messages in general. Since the queue is persistent, nothing is lost when it crashes.
At the other end of the queue, there is a listener who is already laying out data in the database in the background

L

lega, 2015-12-16
@lega

You can quickly write to redis, on several machines, 2 or more (i.e. copies), and then slowly merge where you need to, for example, to monga. Or set up a replica there.
You can duplicate client requests to the second server, and let both write to a file.
Although the volume is small, 60MB, you can hold it in memory until it is written to the database.
http requests? python will take http longer...