A
A
Alexey Sundukov2015-07-01 09:48:52
PhantomJS
Alexey Sundukov, 2015-07-01 09:48:52

Headless browser is fast and stable. Maybe?

I would like to exchange experience in using headless browsers on the server. As a result, I want to know if there is one that works quickly, does not require a lot of resources and does not constantly fall.
In my work I use PhantomJS via webdriver. XPath is used for parsing. As a full browser emulation on the server with the ability to manage it is good for everything. But there are also problems. Requires a lot of RAM (a), leaks (b). At the start, it takes about 300MB, in the process it stabilizes around 500MB, without rebooting every 2-3 hours it reaches 1-5 GB. When they need to run several dozen, it requires a separate good piece of iron. Even if there is enough RAM, it starts to actively use the CPU (in). Moreover, often no tasks are already assigned to him, but something inside him is actively spinning. More often, connections to it via webdrive stupidly hang, then I take off on a timeout (r). Works for a long time on seemingly simple tasks (e). For example, parse a table with a hundred rows. It takes over a minute. After interviewing colleagues and Google, I see
1) Is there a way to optimize the work with it?
While I think in the direction of abandoning webdrive, writing JavaScript with tasks and loading them in the mode when PhantonJS works as a server. Does anyone have experience with this, please let me know if it helps.
As a JS option on the PhantomJS side, I looked towards CasperJS, but I didn’t get around to testing it. Considering what it is based on, I suspect that it will not solve the problems with the CPU (a) and RAM (c), although if it solves the problem of parsing speed (e), then it will be not bad.
2) What other software is there for the task "browser on the server"?
a) I know about the SlimerJS project which is based on Gecko. But this is an attempt to catch up with PhantomJS on the platform from Mozilla. Those. in terms of functionality, they lag behind by definition. Who used, unsubscribe about the impressions.
b) Another option that I used is Selenium Standalone Server which runs FireFox. But the bundle is not very convenient in the context of using proxies. A dozen running full-fledged FireFox, a dozen profiles for each proxy, the need for Xvfb makes the scheme rather cumbersome and just as gluttonous.
c) A ready-made service that accepts requests via webdrive. Transferring all these problems to the side of the host. I ran through the price lists of several similar services and their conditions, and came to the conclusion that it is not suitable for the task of parsing, perhaps it will somehow work for tests. In general, the decision seemed to me to burn out quite a bit.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
W
Way, 2018-10-16
@lackoi

add_filter('nav_menu_css_class', 'add_active_class', 10, 3 ); Replace your line

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question