How to skip non-existent page? requests.exceptions.TooManyRedirects: Exceeded 30 redirects?

D

Dima_Tsyben2021-08-05 14:22:57

Parsing

Dima_Tsyben, 2021-08-05 14:22:57

(I may not have asked the question correctly, please answer without harsh criticism of me)

When the "parsing" process goes to a non-existent address, for example, to https://www.influencive.com/page/10/?s=golf , the Program " thinks" for 10 seconds, then crashes.

Console:

...
 One Thing Superstar Athletes Do That Can Help You Lose Weight—It’s Not What You Think
Daniel Thomas Hind
How Inbound Marketing Helped These 7 Saas Startups Grow
Kevin Payne
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    r = requests.get("https://www.influencive.com/page/" + str(page) + "/?s=" + search, headers=header)
  File "/home/dima/.local/share/virtualenvs/Social_info-HrrlgGsp/lib/python3.8/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/home/dima/.local/share/virtualenvs/Social_info-HrrlgGsp/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/dima/.local/share/virtualenvs/Social_info-HrrlgGsp/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/dima/.local/share/virtualenvs/Social_info-HrrlgGsp/lib/python3.8/site-packages/requests/sessions.py", line 677, in send
    history = [resp for resp in gen]
  File "/home/dima/.local/share/virtualenvs/Social_info-HrrlgGsp/lib/python3.8/site-packages/requests/sessions.py", line 677, in <listcomp>
    history = [resp for resp in gen]
  File "/home/dima/.local/share/virtualenvs/Social_info-HrrlgGsp/lib/python3.8/site-packages/requests/sessions.py", line 166, in resolve_redirects
    raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

The code itself

import requests
from bs4 import BeautifulSoup as BS

search = "golf"
page = 1
s = 0

header = {
  'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'
}

while True:
    r = requests.get("https://www.influencive.com/page/" + str(page) + "/?s=" + search, headers=header)
    html = BS(r.content, "html.parser")
    news = html.find_all('a', rel='bookmark' )
    name = html.find_all('strong', itemprop = "name"  )
    if(len(news)):
        for s in range(len(news)):
            try:
                print(news[s].text)
                print(name[s].text)
            except:
                s += 1
        page += 1
    else:
        break

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dima_Tsyben, 2021-08-06
@Dima_Tsyben

Decided to leave it like this:

while True:
    try:
        print("https://www.influencive.com/page/" + str(page) + "/?s=" + search)
        r = requests.get("https://www.influencive.com/page/" + str(page) + "/?s=" + search, headers=header)

    except Exception as e:
        break