How to simplify the task with queues and parallelization in PHP?

D

Denis Ogurtsov2014-06-12 15:48:37

PHP

Denis Ogurtsov, 2014-06-12 15:48:37

There is a task to process the big data (two connected tables on 1 million records). Process, in the sense of making mathematical miscalculations,
and write the results into new tables.
I did it as follows:
-- I parallelized the processes (one parser), since one miscalculation takes a very long time.
-- run five parsers in sequence (five classes).
I run each script via exec() and sleep(mt_rand(1,10)) in several threads.
When parallelizing, a problem arose that the parsers duplicate the data that they will process. Therefore,
it is necessary that each parser book records for itself. Then there was a need for queues, because,
there is still duplication. Queues created through a file. At the beginning of the reservation, a random hash is written to the end of the file
and then after do {} while (until my hash is the first in the list - wait, but how will it be the first to remove the hash from the file).

It seems to me that I made it difficult and not experienced. Advise, maybe you can use new technologies or in a simpler way?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

Dmitry Entelis, 2014-07-01
@DmitriyEntelis

Wildly crooked decision.
If you need to reserve data behind the parser, you can use something like:

UPDATE content SET worker_id = $id WHERE worker_id=null AND state = 0 LIMIT 5;
SELECT * FROM content WHERE worker_id = $id;

, where content is a table with data that the parser processes, state is a sign that the record has not been processed, worker_id is a number that identifies the current thread (you can use getmypid if within the same server, you can generate rand, you can explicitly set id from the console at startup - not fundamentally)
This solution on a large content table and a large number of parsers will run into sql performance, so it’s more correct to use queue servers as written in the answer above.

P

Pavel Solovyov, 2014-06-12
@pavel_salauyou

gearman, queue rules, or rabbitmq