M
M
Mysterion2017-01-19 21:24:57
Parsing
Mysterion, 2017-01-19 21:24:57

How to properly organize the logic of the parser with authorization?

Good day to all.
Wrote a parser in node js. In general, there are no problems with parsing, as such. A snag in logic and design.
There are pages with content that I get. These pages have links to files. Links are not direct, but to a script that checks if the user is authorized or not.
That's the problem with them.
Logically, I display the content on the page that I parsed, including links. It is necessary that when you click on a link that will point to the express'a API, where the necessary link from the previously received content will twitch, download it to a file and give it to the client (node ​​js> php> client). I'll deal with this.
I do not understand how best to organize authorization when uploading a file. This site has the ability to remember the login. This is good. The site on Xenforo is spinning there. To authorize, there must already be some kind of cookie in the client, so my authorization process is as follows:
Formally, I have already written a verification and authorization algorithm.
Wrote the verification method and the authorization method separately.
In the verification method with the cookie that I have (this cookie can be empty if I have not logged in before), I access the profile page on the site and, if the response code is 403, it means not authorized. Then I send the cookie that was assigned to me in the check along with the authorization data (as required by the site), after authorization I save the new cookie and again call the authorization check method with the cookie I received after authorization, and if the response code is 200, then everything is fine and you can download files.
Как надежно все это организовать. А что если при попытке загрузить файл что-то произойдет и кука будет недействительна? Как лучше поступить, чтобы пользователь гарантировано мог скачать файл? Перед попыткой скачать файл проверять авторизацию или при любом обращении, даже если для получения контента авторизация не требуется, чтобы в дальнейшем я наверняка был авторизован. Вообще, по-сути, я могу и свою куку подставлять, а не данные отправлять для авторизации, такой способ тоже работает.
Может лучше создать массив с куками, который постоянно будет заменять действительно рабочими куками, с которых есть доступ? Сейчас вот в голову такой способ пришел, пока писал. Имеет место быть, я думаю.
Какое ваше мнение, кто уже ранее реализовывал подобное и как решались такие вопросы?
Всем спасибо.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question