A
A
Alexey2017-11-06 11:19:20
MySQL
Alexey, 2017-11-06 11:19:20

How to make a statistics service for 20 million records per day?

Good day.
I ask all experts in the highload topic to help with the following task:
1) Task: based on PHP + MySQL, you need to implement a service for receiving game data statistics.
Structure:
- player ID
- game time of the event
- device ID ( string , 64 characters )
- device platform (iPad, iPhone)
- arbitrary data set (for example, the device sends an event with the parameters Event=Start, Money=15444, etc. )
2) Requirements:
- the service must withstand the reception of 20 million calls per day.
- it is necessary to provide for the possibility of selecting and deleting data for the previous day (for example, on the 10th, data for the 9th is taken and deleted)
- the service must answer the client that the reception was successful or unsuccessful .
Ideas:
1. Database structure
. the data will be written often and a lot, and rarely deleted and read, then I see two options for the structure of the database:
1.1 Every day, cron (cron) create a table for the next day - where the data will be added.
Then deleting and reading data will not affect newly written data.

CREATE TABLE IF NOT EXISTS `day_data` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `user_id` int(10) NOT NULL,
  `datetime` int(10) NOT NULL,
  `device_id` varchar(64) NOT NULL,
  `platform` enum('ipad','iphone') NOT NULL,
  `data` text NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=0;

1.2 The table is created alone, but partitions are made in it (PARTITION) by the day of the month (1-31)
CREATE TABLE IF NOT EXISTS `day_data` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `day_of_month` tinyint(4) NOT NULL,
  `user_id` int(10) NOT NULL,
  `datetime` int(10) NOT NULL,
  `device_id` varchar(64) NOT NULL,
  `platform` enum('ipad','iphone') NOT NULL,
  `data` text NOT NULL,
  PRIMARY KEY (`id`,`day_of_month`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=0
PARTITION BY LIST (day_of_month)
(PARTITION p01 VALUES IN (1) ENGINE = InnoDB,
 PARTITION p02 VALUES IN (2) ENGINE = InnoDB,
 PARTITION p03 VALUES IN (3) ENGINE = InnoDB,
 PARTITION p04 VALUES IN (4) ENGINE = InnoDB,
 PARTITION p05 VALUES IN (5) ENGINE = InnoDB,
 PARTITION p06 VALUES IN (6) ENGINE = InnoDB,
 PARTITION p07 VALUES IN (7) ENGINE = InnoDB,
 PARTITION p08 VALUES IN (8) ENGINE = InnoDB,
 PARTITION p09 VALUES IN (9) ENGINE = InnoDB,
 PARTITION p10 VALUES IN (10) ENGINE = InnoDB,
 PARTITION p11 VALUES IN (11) ENGINE = InnoDB,
 PARTITION p12 VALUES IN (12) ENGINE = InnoDB,
 PARTITION p13 VALUES IN (13) ENGINE = InnoDB,
 PARTITION p14 VALUES IN (14) ENGINE = InnoDB,
 PARTITION p15 VALUES IN (15) ENGINE = InnoDB,
 PARTITION p16 VALUES IN (16) ENGINE = InnoDB,
 PARTITION p17 VALUES IN (17) ENGINE = InnoDB,
 PARTITION p18 VALUES IN (18) ENGINE = InnoDB,
 PARTITION p19 VALUES IN (19) ENGINE = InnoDB,
 PARTITION p20 VALUES IN (20) ENGINE = InnoDB,
 PARTITION p21 VALUES IN (21) ENGINE = InnoDB,
 PARTITION p22 VALUES IN (22) ENGINE = InnoDB,
 PARTITION p23 VALUES IN (23) ENGINE = InnoDB,
 PARTITION p24 VALUES IN (24) ENGINE = InnoDB,
 PARTITION p25 VALUES IN (25) ENGINE = InnoDB,
 PARTITION p26 VALUES IN (26) ENGINE = InnoDB,
 PARTITION p27 VALUES IN (27) ENGINE = InnoDB,
 PARTITION p28 VALUES IN (28) ENGINE = InnoDB,
 PARTITION p29 VALUES IN (29) ENGINE = InnoDB,
 PARTITION p30 VALUES IN (30) ENGINE = InnoDB,
 PARTITION p31 VALUES IN (31) ENGINE = InnoDB);

2. Deletion
There are no problems with deletion. in the first option, DROP the table is fast enough
. In the second option, we do ALTER TABLE `day_data` TRUNCATE PARTITION p11 - this is also fast. very roughly, Mysql deletes the old data file and creates a new one. And this operation is performed much faster than line-by-line deletion.
3. Reading
In order to avoid data loss and not to hang up the server to download 20 million records on the fly - we first save this data in a csv file on the server and for the download - we give a link to the file on the server.
We save in a cycle of 10,000 records - setting a flag in the database for the data already saved to a file.
4. Return to the client Send
the answer in the form of JSON
In case of success, send ok
In case of failure:
if the data is incomplete - an error with information about incomplete data
if there is an error when inserting into the database - information about the insert error into the
database
and
the
error
number
innodb_file_per_table
innodb_flush_method
innodb_flush_log_at_trx_commit
Can I suggest something here
Thanks in advance to everyone who answers.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
R
RidgeA, 2017-11-06
@RidgeA

1. Why MySQL?, maybe MongoDB or Cassandra are better.
2. What kind of statistics do you need for this data?
3. IMHO it is better not to put one entry into the database, but to accumulate entries and write to the base (regardless of the base) in batches. For accumulation, some kind of queue broker (for example, RabbitMQ) is suitable. Queues will allow you to smooth out load peaks - it will be written to the database more or less evenly, in peaks - the queue will simply increase.

A
Alexander Aksentiev, 2017-11-06
@Sanasol

20 million records per day

right on the first day, only the domain was bought and immediately 20 lyams ran up.
Without preparation and fails, aka experience, nothing will come of it, first you need to look at how much load there will be.
And the load depends on what and how it will work there.
Nobody here predicts the future.
By the time there are 20 lames of real events per day, you will already have enough skill to process it.
This is 230 inserts per second, if you write to one table. And this is for mysql server out of the box.
Not to mention the load on the web server itself and the server as a whole.
Those. this is not a job for one dedik 100%.
In general, all this is learned by experience in your specific circumstances.

S
Sergey, 2017-11-06
@begemot_sun

Use ClickHouse :)

L
lega, 2017-11-10
@lega

The simplest and fastest option is to simply dump all requests into a file (you can do it directly from nginx log, but with conversion before sending), then send this file, again through nginx.
It will keep 1000 million records per day or more (depending on hardware).

A
Artem, 2017-11-06
@proudmore

If I were you, I would look towards Yandex ClickHouse. Read it, and if you find it convenient, try to convince your superiors of the validity of using this DBMS.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question