G
G
Graid2015-03-26 16:36:16
Perl
Graid, 2015-03-26 16:36:16

On what to execute a set of parallel requests?

You need to run many parallel queries. Now the entire implementation is based on php multicurl, but unfortunately, with the transition to ssl, the number of requests had to be reduced. the response time has greatly increased (even with certificate verification disabled). Also not satisfied with the load on the CPU. As a test, I sketched a small analogue on Node.js. The speed of obtaining pages was noticeably pleased, but before rewriting the system, I would like to hear a few tips.
What technology and language is better to choose for this. Page fetching speed, CPU and memory consumption during DOM parsing are important, but separation into loading and parsing in different applications is possible. From the perspective of potential candidates, node.js, python, perl are close to me. I watched Erlang, but it seemed too unusual, is it worth trying for a beginner? What else do you recommend?

Answer the question

In order to leave comments, you need to log in

8 answer(s)
T
Timofey, 2015-03-26
@Graid

Node.js should fit perfectly, judging by the description. With other languages, I don't think it will be better. Maybe the same, but not better. Especially since you have already written something on the node, then why switch to something else.

A
Andrey K, 2015-03-26
@mututunus

Go

S
Sergey, 2015-03-26
@begemot_sun

In Erlang of course.

A
Alexander Kubintsev, 2015-03-26
@akubintsev

In php, the implementation through the ReactPHP asynchronous framework is great. However, it is not a fact that this solution will suit you, because. I do not know the balance between the number of connections and the resource consumption of the parser in your case.
I dabbled in this task in the recent past, you can see my fork here https://github.com/kryoz/homer

U
un1t, 2015-03-26
@un1t

Basically, there are two ways
1) in parallel to several
python-rq, celery workers
2)
aiohttp asynchronously, twisted
node.js is also suitable, but it is rather bad with libraries for parsing html / xml, i.e. they either have a clumsy interface, or a heavy load on the CPU, tk. written in bare JS

D
Dmitry Entelis, 2015-03-26
@DmitriyEntelis

multicurl is a bad idea because multicurl exits when all concurrent requests complete.
I think that brakes at you precisely because of it.
Try using forks instead.
We have a lot of all sorts of stories about parsers, external apis - and someone else's server is more likely to fall than we run into the processor.

H
He11ion, 2015-03-26
@He11ion

php + gearman / any other multi-threaded executor.
As already correctly written - the problem is not in php, but in the lack of multithreading.

S
skynetdev, 2015-03-28
@skynetdev

Go will be better than Node.js, I have seen quite a few articles on this topic, but if you look at what is better than GO, then these are two languages ​​​​Rust and Haskell, but Rust will be better and fresher. I read all the comparisons from the articles, but I can hardly convey them here.
What to choose for you, choose for you, we can only suggest what to try
Go I think it will be easier to learn

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question