R
R
Roman Andreevich2019-11-08 13:48:49
JavaScript
Roman Andreevich, 2019-11-08 13:48:49

What is the best way to implement a scraping bot?

Colleagues, good day, to calm the soul, so to speak, I want to hear your opinion on the topic of how best to implement a scraping bot?
There are essentially 2 options, the first is the chrome extension, and the second is to use puppeteer (or equivalents). The main issues are performance, scalability, implementation of utility functions (logging, gif clips of command execution, reloading pages from open in the browser, working on a unix server ...) and so on.
The tasks of the bots are simple, the bot is running, connects via ws to the server to receive messages with commands, opened the page, logged in, wait for a command from the server, the command came, went to the section, found the block, clicked where the thread was and pulled out some information (not the point important which one). If the page does not respond for a long time, then restart it or turn it off.
What will work faster, extension or nodejs process running? The extension works in the context of the browser in which it is launched, as far as I understand, it means it is slower + proxy! Besides ws messages will fly through a proxy or there will be a separate channel?
All sites are greedy.
Well, in general, your opinion and justification, some)))
I would be grateful to all the comments.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question