Answer the question
In order to leave comments, you need to log in
How to simplify the task with queues and parallelization in PHP?
There is a task to process the big data (two connected tables on 1 million records). Process, in the sense of making mathematical miscalculations,
and write the results into new tables.
I did it as follows:
-- I parallelized the processes (one parser), since one miscalculation takes a very long time.
-- run five parsers in sequence (five classes).
I run each script via exec() and sleep(mt_rand(1,10)) in several threads.
When parallelizing, a problem arose that the parsers duplicate the data that they will process. Therefore,
it is necessary that each parser book records for itself. Then there was a need for queues, because,
there is still duplication. Queues created through a file. At the beginning of the reservation, a random hash is written to the end of the file
and then after do {} while (until my hash is the first in the list - wait, but how will it be the first to remove the hash from the file).
It seems to me that I made it difficult and not experienced. Advise, maybe you can use new technologies or in a simpler way?
Answer the question
In order to leave comments, you need to log in
Wildly crooked decision.
If you need to reserve data behind the parser, you can use something like:
UPDATE content SET worker_id = $id WHERE worker_id=null AND state = 0 LIMIT 5;
SELECT * FROM content WHERE worker_id = $id;
, where content is a table with data that the parser processes, state is a sign that the record has not been processed, worker_id is a number that identifies the current thread (you can use getmypid if within the same server, you can generate rand, you can explicitly set id from the console at startup - not fundamentally) Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question