How to organize a high-load project on a node?

C

Coder3212017-05-20 02:29:41

Highload

Coder321, 2017-05-20 02:29:41

A project is planned on a node that will have approximately 2-3 thousand requests per second. Each request will return some file. Each upload of a file should be logged and eventually statistics collected for each file. As I see it:
there will be an array with 60 nested arrays, each nested array will correspond to one second. An object with statistics will be pushed into each of these arrays. When the 60th element is reached, the array will be sent to the database for writing and reset to zero.
Since I have never done such a loaded project, several questions arose:
1. Is the idea itself correct? If not, what can you suggest?
2. What database to use?
3. Is it correct to use an array to store minutes? All the same, if some kind of failure occurs during this minute, then all the statistics for this minute will be erased.
4. What to do with the table of minutes, because it will grow a lot?
5. How to simulate 2-3 thousand requests for testing?
6. And finally, what advice can you give for the implementation of such a project?
PS For those who will write that the node is not for returning statics, it's not my whim. and the customer.
PS I worked only with monga and then at a basic / intermediate level.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

F

Falseclock, 2017-05-20
@Falseclock

Architecture was approached from the wrong side. When organizing a database, you need to think first of all not how to store it, but how to use it later, and the storage method will be drawn by itself.
Who, how, under what circumstances will use the data?

D

Denis Bukreev, 2017-05-20
@denisbookreev

2-3 thousand per second?
New obviously successful social network?

L

lega, 2017-05-20
@lega

There are not always 60 seconds in a minute ;-P

R

rPman, 2017-05-20
@rPman

Either you are guaranteed to save events but process them slowly, or vice versa.
To begin with, do not complicate the system - try writing to the log for each event, if the disk speed is not enough, change the storage method, sequentially excluding what slows down (for example, the file system, when writing to a file, several operations actually occur, including in different parts of the disk).
upd: 16byte writes, cheap ssd+ntfs:7674 rec/sec, old hdd+ntfs:425 rec/sec
The log has a great feature - it is written linearly (of course, I am not yet considering the tools for reading it, in a loaded system these tasks will have to be solved by dividing the load on the hardware), even for HDD iops in this case will be optimal (with the exclusive use of this disk by the process of course), since the built-in buffer of the non-volatile memory of the hard disk will work.
If the linear speed of the disk is not enough (in your case it will be enough by itself, unless, of course, you write multi-megabyte records to the log) - install several disks, even without RAID with striping (you can implement it yourself by spreading log messages across different disks according to your logic ).
Also, instead of trying to implement everything in one machine, you can put several ( important - with independent energy sources!), then it will be possible to use buffering already in the RAM of these machines, i.e. send log entries to several machines at once, and consider that the entry is written after receiving more than a few of them (not necessarily all), .. Here you can use ready-made tools.