F
F
Frim0nt2020-01-17 20:44:49
Python
Frim0nt, 2020-01-17 20:44:49

How to log in with a parser on the site?

Help, I'm new to Python, I'm parsing the site based on this tutorial: https://www.youtube.com/watch?v=kO8AHedGh8o
My task is to get information from the site with authorization, I got the following code:

import requests
from bs4 import BeautifulSoup

class Bars(object):
    url = 'ссылка на сайт'

    def auth(self):
        url = self.url+'/auth/login'
        session = requests.Session()
        params = {
            'login_login':u'мой логин',
            'login_password':u'мой пароль'
        }
        r = session.post(url,params)
        print(r.text)

if __name__ == '__main__':
    bars = Bars()
    bars.auth()

In the tutorial, a person through the developer tools looks at the request during registration with the status 302 and the POST method, in which the data that he sent is displayed below (the Form Data item), in this item there is a link to which you need to drop the login data
5e21ebf19271c610504976.png
5e21ec0e5698a135166466.png
. But for me, if enter the correct data request 302 with the GET method and, moreover, does not have the Form Data item
5e21f00b2b3c5294159679.png
(blue squares are the main link by type):
https://qna.habr.com/

And at the bottom there is no Form Data item.
After that, I decided to enter incorrect data and see where this request would enter, and by entering them I received a 301 POST method request where the Form Data item was, which showed the data that the site requires during authorization
5e21f1a01184d351952485.png
. Then I decided take the request link from there and send this data there, and I got this in the IDE:
5e21f20ed825f154372673.png
How to make me enter this data, log in and get the html code of the site itself (with authorization)

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
SKEPTIC, 2020-01-17
@Frim0nt

Well, like xs what kind of site you have there, but you can almost always do authorization and parse.
Here is the algorithm of actions:
1) Create an acc (preferably manually, because there can be all sorts of different captchas, etc.)
2) Log in again manually and simultaneously look at the Network tab in the debugger of the browser. May fail if the page is reloaded. Then you go to the authorization form and look at the field names and the request handler. It might not work out that way either. Then download some tool for debugging the site so that the page does not reload and you can see the request.
3) Create a session in python and make a request. In the future, you constantly send requests only through the session, because it's almost like a browser (session stores cookies and other crap)

authdata = {'login': 'mylogin', 'password': 'mypassword'}
mysession = requests.session()
response = mysession.post('https://example.com/reg.php', data=authdata)
parsedata = mysession.post('https://example.com/catalog') //тут я делаю парсинг через сессию страницы каталога сайта, ты вписываешь свою страницу, которую хочешь спарсить

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question