V
V
Vladimir Merk2017-01-20 12:51:34
PHP
Vladimir Merk, 2017-01-20 12:51:34

How to implement multi-threaded data update without conflicts?

I've got an interesting task that I can't handle.
Condition:
There is a table with data. It is necessary to select one record from the table in multithreaded mode and update the number of choices for it. Each entry must be selected no more than two times. There is also a micro delay between selection and update.
In practice, there is a script that runs in multi-threaded mode and selects a record from the innodb mysql table that was selected no more than 2 times before, based on the count field. After selecting an entry for it, the count field is incremented by 1.

$connect = mysqli_connect($host, $dbuser, $dbpass);
  if($connect){
    mysqli_select_db($connect, $dbname);
    for ($i = 1; $i <= 10; $i++) {
      //Выбираем подряд записи, у которых count < 2
      $sql = mysqli_query($connect, "SELECT id FROM threadsTest WHERE id >= {$i} AND count < 2 LIMIT 1;");
      if (mysqli_num_rows($sql) > 0){
        $row = mysqli_fetch_assoc($sql);
        //Задержка после выбора 0.1 сек
        usleep(100000);
        //Обновление count
        mysqli_query($connect, "UPDATE threadsTest SET count = count + 1 WHERE id = {$row['id']}");
      }
      
    }
    mysqli_close($connect);
  }

But it is obvious that in such a situation the fields are selected more times, because the update simply does not have time to be fulfilled and the logic of work is broken.
My solution is to create an additional file for writing after the fetch if it doesn't exist and increment the fetch counter into it with a lock. Immediately after the selection, check this file and if the counter in it is 2, try to select the next record. However, this method has a drawback. The check occurs after the selection and it is necessary to sort through a much larger number of records. In my test, this increased the running time by 100 times and this is very dependent on the number of threads and the amount of data in the table.
What can be done in such a situation, while maintaining the logic of work and an acceptable speed of execution?
I will be glad to any fresh ideas and suggestions :)
For those who are interested in trying to figure it out, here is a test script:
https://yadi.sk/d/mnHA4grn39xBaT
create_table.php - creates a test table with id and count fields, fills it with data
clear_table.php - resets the count in the
thread.php test table - the script itself, which makes a selection and increments the count
run_threads.sh - bash script, which runs thread.php in 300 threads
UPD:
I managed to solve the problem by locking the FOR UPDATE lines and adding 2 fields.
Added the field lock - tinyint
Added the field type - tinyint, added type to the index
The above code looks like this:
$connect = mysqli_connect($host, $dbuser, $dbpass);
  if($connect){
    mysqli_select_db($connect, $dbname);
    mysqli_query($connect, "SET AUTOCOMMIT = 0;"); //Отключаем автокоммит
    for ($i = 1; $i <= 10; $i++) {
      $rand = rand(1,10); //Это сделал для ускорения обработки, что бы каждый запущенный скрипт выбирал более-менее разные строки
      mysqli_query($connect, "BEGIN;");//Стартуем транзакцию
      //Выбираем подряд записи, у которых count < 2, type = от 0 до 10, и поле lock = 0, блокируем запись с помощью FOR UPDATE
      $sql = mysqli_query($connect, "SELECT id FROM threadsTest WHERE id >= {$i} AND type = {$rand} AND `lock` = 0 LIMIT 1 FOR UPDATE;");
      if (mysqli_num_rows($sql) > 0){
        //Ставим указатель блокировки вручную, чтобы, пока эта запись обрабатывалась, другие скрипты ее не извлекали
        mysqli_query($connect, "UPDATE threadsTest SET `lock` = 1 WHERE id = {$row['id']};");
        //Коммитим апдейт снимая блокировку с записи
        mysqli_query($connect, "COMMIT;");
        $row = mysqli_fetch_assoc($sql);
        //Задержка после выбора 0.1 сек
        usleep(100000);
        //Обновляем count, снимаем указатель блокировки
        mysqli_query($connect, "BEGIN;");
        mysqli_query($connect, "UPDATE threadsTest SET count = count + 1, `lock` = 0 WHERE id = {$row['id']}");
        mysqli_query($connect, "COMMIT;");
      }
      
    }
    mysqli_close($connect);
  }

In this way, it was possible to achieve the execution of all scripts, without exceeding the count, and with an acceptable execution time.
If the initial version of 10 iterations in 300 threads took ~1-2 seconds, then this version in the same amount is completed in ~2-4 seconds. Without type = rand ~8-10 sec.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexander Aksentiev, 2017-01-20
@VladimirMerk

SELECT FOR UPDATE

D
Dmitry Dart, 2017-01-20
@gobananas

What do you understand by flows in this case? In PHP, there are no threads in their classic form. Just several scripts simultaneously process the same database table?
UPD:
Then everything is not simple, but very simple. Write a script in it, a query that selects, for example, 100 records and processes them in a minute. And there are many such scripts then the first script selects:
I.e. first 100 entries.
The second script should choose LIMIT 100,200, but suddenly there will be some delay, so to be sure, we will make the second LIMIT 200,300 and further the third script LIMIT 400,500 and you can do as many such scripts as you like until the system is fully loaded. They can be run by cron and simultaneously select data without intersecting with each other.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question