How to "organize" the action script on the site?

A

Artur Vorotnikov2019-09-21 09:55:17

API

Artur Vorotnikov, 2019-09-21 09:55:17

It is necessary to create a program that performs a visit to a specific site and the subsequent transition to its pages. (Automatically open news of the desired category from one media outlet).
But I don't understand what tools can be used to do this!? Is Javascipt or Python suitable for this, or are there any other query languages suitable for this? (I read about Selenium here )
And another very noob question - what can this be in the end? - just a script file written in a pure language or a project that uses no third-party bible?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

rPman, 2019-09-21
@dizpatcher

2 radically different approaches:
* you study what requests the browser makes when you perform actions on the page (in the F12 browser, the network tab, where you can press the right button on the line and get a ready-made curl utility command, for example), study what the request is, select the parameters (what is the session identifier, what is the post number, etc. - empirically and following the logic as you yourself as a programmer would do) and then repeat all the same or only necessary (for example, you can not load pictures, styles, etc.) on in any programming language, most likely, when loading pages, you will have to parse them, parse links in order to know what parameters to substitute in the following requests.
* using a browser extension (for example, greasemonkey/tempermonkey), or using a headless browser like silenium with a connection to your favorite language, or directly adding functions once in the browser console (if the site is a single page app without reloading the page) - useful for debugging, write the necessary code straight to javascript. For example, to click on a link, it is enough to write $('css selector before the link/button').click()
The first approach is the most efficient and fastest, does not require a large load on the machine from which automation takes place, if the server does not mind, you can run a lot at once and frequently queries and collect data quickly.
But if the server resists such technologies, then either you start a great sword and shield battle, or only the second approach remains. The code of the second approach is usually simpler, it will take less time to analyze the site, this is especially true with the support of such automation scripts when the site is updated or the design changes. The second approach on the server side is almost impossible to detect, since the actions are completely identical to the user's actions, especially if you click not immediately, but after a timeout, simulate scrolling and mouse movements. The disadvantage is a very high load on the processor and memory of the machine with the simulator, because each parser is a full-fledged browser (especially if you need to use a proxy with one instance, you won’t get off)