R
R
Roman Savchuk2017-10-31 00:54:17
PHP
Roman Savchuk, 2017-10-31 00:54:17

PHP vs GOLANG, parser, what to write on?

Good afternoon.
Question for the connoisseurs.
The task is to write a rather thick web page parser.
The task of the parser will be to parse competing sites, analyze prices, bring prices to the average, record the average price for goods on its resource.
The parser will be located on a separate subdomain and will have a personal account login form.
Various statistics, work protocol and other information will be displayed in your personal account.
There are 2 languages ​​to choose from, php and golang.
Actually, the question itself is, what is the most cost-effective way to write such a parser (in addition to a small web interface).

Answer the question

In order to leave comments, you need to log in

5 answer(s)
M
My joy, 2017-10-31
@t-alexashka

It is more economical on what is more convenient for you. What you know better, write about it.

D
Dasha Tsiklauri, 2017-10-31
@dasha_programmist

If you choose Go then look towards the goquery package - jquery selectors on go. But it seems to me that it is most convenient on js (node ​​8), in master-slave mode, i.e. 1 instance drives/balances, other worker instances are directly engaged in parsing.

S
Sergej, 2017-10-31
@sayber

I would choose Go. It has many advantages for fast processing, multithreading, etc.
A small performance, you can watch https://www.youtube.com/watch?v=MitOZ3Bx6QE (not advertising)
Basically the first 5-7 minutes
Just from practice in real work, about speed.

A
Alexander Taratin, 2017-10-31
@Taraflex

No difference. All the same, most of the application's time will be spent loading pages.

P
polarlord, 2017-10-31
@polarlord

As noted above, most of the time the program will be idle, i.e. wait for a response (loading pages). In the general flow, this time will be disproportionately longer than processing the response. Therefore, it is better to use the asynchronous network model here, when you send a lot of requests, and then the response handlers will already "twitch" according to the events. This is much more economical than the multi-threaded approach, even if it is green threads Go. Indeed, in the latter case, many threads with requests will be created, which will be idle 90% of their time waiting for a response.
Why such attention is paid to idle time (waiting for a response)? The fact is that only in ideal conditions you get a response to your request as quickly as possible. In real conditions, not everything is so rosy. In addition, do not forget about using a proxy, otherwise you will certainly be banned. And using a proxy increases the response time quite significantly.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question