B
B
Bjornie2016-12-22 23:58:30
Python
Bjornie, 2016-12-22 23:58:30

Requests loses authorization, how to fix?

Now my script works like this: I log in using Selenium, run through dynamic pages that have a table, and in each line I click on the link "More". I pass this link to BeautifulSoup, which, in turn, logs in under other data, and pulls the data I need from there.
But I ran into the problem of unstable parsing. This is the second day I've been trying to solve the problem by pausing and googling various solutions, but so far I have not had enough experience to solve the problem. My code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.select import Select
from urllib.parse import urlparse, parse_qs
import selenium.webdriver.support.ui as ui
import time
import requests
from bs4 import BeautifulSoup

login = 'QQQQQQ'
password = 'AAAA'
auth_page = 'http://site.ru/auth'
headers = {'User-Agent':'Mozilla'}
payload = {'user': login, 'pwd': password}

with requests.Session() as session:
    s = session.post(auth_page, data=payload, headers=headers)

def parse_content():
    soup = BeautifulSoup(driver.page_source, "lxml")
    lens = len(soup.select('#left_program > tbody > tr'))

    for i in range(1, lens+1):
        tr = soup.find(id="left_program_ctl"+ str(i) +"_row")
        ...
        details_link = tr.find('a').get('href')

        r = session.get("http://site.ru" + details_link, headers=headers)
        details_soup = BeautifulSoup(r.text, "lxml")
        details_den = details_soup.find(id="den")
        details_loc = details_soup.find(id="loc")
        details_prov = details_soup.find(id="prov")

        print(r.url)

In response I get:
OK!!!!!!!!! www.site.ru/page/id_3/team_7
OK!!!!!!!!! www.site.ru/page/id_5/team_10
OK!!!!!!!!! www.site.ru/page/id_7/team_13
FAIL: www.site.ru/home
FAIL: www.site.ru/home?PHPSESSID=v46ip6ecfp0n5h8dvp1smukou6
FAIL: www.site.ru/home?PHPSESSID=eb7vblrhj135hcvae7dou5n5r7
FAIL: www. site.ru/home?PHPSESSID=rueiaoes5lblmfn0s10v2l9ip3

Those. 1, 2, 3 times it turned out to be authorized to open the page, on the 4th and subsequent sessions it was lost (redirect to the main one).
It's not clear to me what the problem is, tk. sometimes 20 links open quietly, sometimes 0, sometimes it gets stuck in the same place.
In this case, the session can be restored, then interrupted again. In general, I did not find a pattern, because of this I am sitting in a complete misunderstanding where to dig.
I will add right away that the links in the lines are parsed normally, there are no misses, I checked it. Without visiting the page "More" there are no misfires at all!
Question: if authorization is lost, how to repeat it and RETURN to the page I need for parsing.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question