I
I
Ivan Ivanov2021-12-03 12:46:32
PHP
Ivan Ivanov, 2021-12-03 12:46:32

How to run 5000 threads in parallel with GET requests?

The task is to keep constantly 5000 threads that will make GET requests to the API (15-20 seconds per request) (currently 1000, but in the near future scaling to 5000 exactly)

My main language is PHP.
Now the system works through queues, 300 workers with single-threaded CURL. On one machine (AMD Ryzen 7 1700X 8 cores, 64GB RAM). Stack Yii2-queue + mysql driver + supervisor. In general, everything works. But it seems
wrong to make 5000 or even 1000 workers on such a stack (even just running 1000 workers was problematic) .



  • I was planning to move to RabbitMQ, (I don’t need to store messages after they are received by workers, it’s important for me to know that messages are delivered, so I chose RabbitMQ instead of Apache Kafka, but I didn’t work closely with any system)
  • Parallel execution in the worker itself using ReactPHP or better GuzzleAsync. In this case, you do not have to keep 5000 workers exactly
  • Add more cars if needed. At the same time, I don’t know yet whether to access one database or do replication

2. Is it "correct" to do this at all using PHP, or is it still the task of other languages ​​​​that can do parallel execution, coroutines? Go NodeJs?
3. Maybe there are already ready-made solutions in the form of PHP libraries? Searched but didn't find

Answer the question

In order to leave comments, you need to log in

4 answer(s)
V
Vamp, 2021-12-03
@surlan

Parallel execution in the worker itself using ReactPHP or better GuzzleAsync. In this case, you do not have to keep 5000 workers exactly

The GuzzleAsync option is the best. Under the hood, it uses the curl_multi_exec capabilities , which allow you to send multiple requests asynchronously without spawning unnecessary processes. I’m not sure, of course, that it will master 5000 parallel requests, but even if it can’t, then 5000 can be divided between several workers.
2. Is it "correct" to do this at all using PHP, or is it still the task of other languages ​​​​that can do parallel execution, coroutines? Go NodeJs?

Your load is mostly IO bound, so it doesn't matter which language you choose. The main thing is that it supports IO multiplexing (which is supported in PHP via the aforementioned curl_multi_exec).
3. Maybe there are already ready-made solutions in the form of PHP libraries? Searched but didn't find

Guzzle

A
Alexander, 2021-12-03
@xpert13

To begin with, I want to say that my main stack at the moment is also PHP. Perhaps I am mistaken in something, so you should not take my opinion as the opinion of a great guru, but I will still share it.
1. Nothing. I was digging into some PHP multithread framework like ReactPHP the other day and under the hood it's just running a separate process for each thread. Yes, there are optimizations that allow you to eat less resources, but it still looks like a resource-intensive crutch.
2. I would choose another language. Judging by the description of the task of the code, there is not much there, so it will not be difficult to rewrite.

S
Sergey Sokolov, 2021-12-03
@sergiks

Look at Swoole PHP - he's good at non-blocking async and coroutines . This way you can run non-blocking curl requests in fewer threads. Swoole works with standard libcurl - you can use the same GuzzleHTTP.

I
Ichi Nya, 2021-12-09
@Ichi

You can look at phabel and there are standard multithreading facilities for php. But I didn’t check on so many large requests (I have a small project with several dozen requests).
Here it is most likely better to look towards Python or Go. It should work much better for them.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question