A
A
Alexander2022-04-10 20:41:47
MySQL
Alexander, 2022-04-10 20:41:47

How to reduce the load on Mysql query on a very loaded database?

Hello to all! I am writing again. Partially the issue was resolved, but some misunderstandings remained.
There is a seriously loaded project.
There is replication (we write to the master, we read from the slaves)
There are 2 tables that go to a huge number of records and a huge number of inserts and a huge number of readings (Task and TaskFlow2). The tables are huge.
And there is a select with which there are problems.
The indexes are all done.

SELECT 
  task.id, 
  task.remains
FROM 
  Task task 
WHERE 
  task.status = 'active' 
  AND task.type = 'follow_profile' 
  AND task.provider = 'insta' 
  AND task.cat = 3 
  AND NOT EXISTS(
  SELECT 1 FROM  TaskFlow flow 
  WHERE 
    flow.url_id = task.url_id 
    AND flow.executor_acc_id = $acc 
  );
ORDER BY task.id ASC
LIMIT 300

Indexes:

Task ( status, type, provider, cat )
TaskFlow (executor_acc_id, url_id )

Essence: With the number of select and insert requests in them about 3000+ per second, everything is ok.
As soon as the number exceeds 4000+, the load on the slave processors begins to grow strongly, the selects start to be executed with a delay in the sending data status and a huge amount of them accumulates if you look at the show processlist and the lag between the master and the slave begins to grow. Accordingly, everything is stupid and does not work correctly.
Does not stop until you limit the flow of requests. As soon as we stop inserts and selects, everything returns to normal in a couple of seconds

Tell me where to dig. Thank you!

UPD: If you increase the LIMIT in the sample, the load increases significantly.

Answer the question

In order to leave comments, you need to log in

6 answer(s)
R
rPman, 2022-04-10
@rPman

you already asked this question, do you think more information will come up with you?
since the slave slows down, why not solve the problem by brute force and put the second, third .. tenth, and make your requests to the slave that is less loaded at the moment or stupidly in turn
ps how much data? wouldn't it be more efficient to implement your tasks in the RAM of your separate application, and instead of the database, let your backend (or whoever) crawls into this application ... no sql database under any circumstances will work faster than your own an application with data in RAM, and if one thread is enough for you, then you will spend very little time developing a solution

D
Drno, 2022-04-10
@Drno

What is the size of the base?
Perhaps a stupid option - but you can make a disk partition from the RAM ... there is nowhere faster.
Just copy
A to dump into the base gradually. And it's ready ...
Well, or file a bunch of ssds into the raid ....

A
Alexander, 2022-04-11
@Aleksandr-JS-Developer

Under the "Huge database" each specialist has his own understanding. 10GB, 100GB, 1TB, 10TB, 100TB?
IMHO, 1TB may not be a small database, but certainly not a huge one. What does "huge" mean to you?
When one server ceases to cope physically (vertical scaling is no longer effective), they start thinking about another scaling. In particular, about the horizontal.

4
4X_Pro, 2022-04-10
@XXXXPro

Apparently, the problem is that MySQL starts not to have time to rebuild indexes.
The question immediately arises: are status, type, provider ENUMs or VARCHARs? If the latter, is it possible to replace them with ENUM? If not, then you can try to make an index not by the full field, but by its limited length, for example, Task ( status(4), type(8), provider(4), cat ). In this case, the indexes will be rebuilt faster.
Another option is to make an additional numeric field, which will be a hash of the string value, and make a selection based on it. You can use even the most common CRC32 and compare like this WHERE status_crc=CRC32('active') AND type_crc=CRC32('follow_profile'). Accordingly, indexes should be made not by status and type, but by status_crc and type_crc (indexes are built faster for numeric fields than for string ones).
Another option is to make in your code the ability to buffer and accumulate a certain number of INSERTs, then do DISABLE KEYS, execute the accumulated INSERT, then ENABLE KEYS.

E
edo1h, 2022-04-12
@edo1h

Without any major architectural changes, only partitioning into several servers will save you, so that each has only part of the table.

R
Ruslan., 2022-04-14
@LaRN

You can try to do a little normalization, for example, move three fields from the task table ( type, provider, cat ) to a separate tasktype table, and leave only one tasktypeID field in the original table and place this field in the index of the task table instead of the previous three.
Because If your query conditions are static, then you can first play tasktypeID on the tasktype table, which should not be very large, and then go to the index from task using two fields instead of four. In addition, the index in task will take up less space, and this should speed up the search in the index and its rebuilding when new rows are added.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question