W
W
webfaker2017-02-12 15:56:54
Python
webfaker, 2017-02-12 15:56:54

Server-side JS parser for multiple accounts with dynamic content: where to dig?

To immediately cut off unproductive polemics: I'm not trying to break anyone, steal something, and so on.
The essence of the problem: there is a site (a large international service, but I can’t name it), from which you need to download reports ~ once a day. There are many accounts - several dozen, API - no. A "specially trained" person has to log in many times, which takes a lot of time.
In the 21st century, I would like to automate the process: I myself am a "php-person", but there are a number of difficulties: multi-stage authorization, on ajax'e - with a bunch of cookies that affect the process on subdomains + the content of the reports itself is also loaded by ajax on a post-request, of course with same-origin, so CRON and other school technologies will not help.
It seems to be some kind of browser parser ... and, of course, I will look for a specialist, but I'm used to understanding the subject as much as possible: from what I googled, only a mess of words was formed, which were sewn with tags to the question.
If someone has faced similar tasks, please share the recipe: is this even possible and what options are there, what specific technologies are better to use. It is desirable, taking into account the solution of possible side problems, such as captcha, possible IP blocking, etc. I repeat: there are several dozen accounts)
PYSY: do not take it for impudence - for the first time I decided to turn to the Habra community for help, with hope))

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2017-02-12
@dimonchik2013

you can immediately here + Scrapy (it may not be the same AJAX, but xs what do you have)
(there is also Grab, but xs how it is with JS)
or you can use the classics - PhantomJS / Selenium - to understand one site, then multiply
IP bans - GNP / Proxy, better own / purchased permanent
Captcha - anti-captcha services (where handles), while Google still allows

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question