Can't click on the link from pagination, gives 404?

P

Pavel Ivanov2020-05-27 16:31:02

Python

Pavel Ivanov, 2020-05-27 16:31:02

Hello everyone
I need your help, I can't parse pagination link in HEADERS - https://www.strava.com/segments/9926585/leaderboar...
in HTML it's like this - https://www.strava.com/segments/9926585/ leaderboar...

headers code

Request URL: https://www.strava.com/segments/9926585/leaderboard?club_id=225082&page=1&per_page=25&partial=true
Request Method: GET
Status Code: 200 
Remote Address: 192.168.7.10:3128
Referrer Policy: no-referrer-when-downgrade
cache-control: no-cache, no-store
content-encoding: gzip
content-type: text/html; charset=utf-8
date: Wed, 27 May 2020 13:14:20 GMT
etag: W/"1f29ed5179163df3e47f122e9f646fd2"
expires: Sat, 01 Jan 2000 00:00:00 GMT
pragma: no-cache
referrer-policy: strict-origin-when-cross-origin
status: 200
status: 200 OK
via: 1.1 linkerd
x-content-type-options: nosniff
x-download-options: noopen
x-frame-options: DENY
x-permitted-cross-domain-policies: none
x-request-id: d3834ff4-23b8-4fdb-8073-aef8ad68bd6b
x-xss-protection: 1; mode=block
:authority: www.strava.com
:method: GET
:path: /segments/9926585/leaderboard?club_id=225082&page=1&per_page=25&partial=true
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
cache-control: max-age=0
cookie: ?????
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36
club_id: 225082
page: 1
per_page: 25
partial: true

HTML code in chrome developer

<nav>
  <ul class="pagination" data-filter="overall">
    <li class="previous_page disabled"><span>←</span></li>
     <li class="active"><span>1</span></li>
     <li><a rel="next" href="/segments/9926585/leaderboard?club_id=225082&amp;filter=overall&amp;page=2&amp;per_page=25">2</a></li>
     <li><a href="/segments/9926585/leaderboard?club_id=225082&amp;filter=overall&amp;page=3&amp;per_page=25">3</a></li>
     <li><a href="/segments/9926585/leaderboard?club_id=225082&amp;filter=overall&amp;page=4&amp;per_page=25">4</a></li>
     <li><a href="/segments/9926585/leaderboard?club_id=225082&amp;filter=overall&amp;page=5&amp;per_page=25">5</a></li> 
     <li class="next_page"><a rel="next" href="/segments/9926585/leaderboard?club_id=225082&amp;filter=overall&amp;page=2&amp;per_page=25">→</a>
     </li>
   </ul>
 </nav>

# получаем данные с таблицы
def get_table_data(num):
    lis = []
    global NAMES
    url = '{}/leaderboard?club_id=225082&filter=overall&page={}&per_page=25&partial=true'.format(conf.URL_RATING, num)
    print(url)
    response = session.get(url, headers=headers)
    print(response)
    soup = BeautifulSoup(response.text, "lxml")
    table = soup.find('table', {'class':'table table-striped table-padded table-leaderboard'}).find('tbody').find_all('tr')
    print('===================== Все участники с данными из списка ========================')
    for rows in table:
        col = rows.find_all('td')
        if col[1].a.text.replace('\n', '') in NAMES:

            try: rating = col[0].text.replace('\n', '')
            except: rating = None

            try: name = col[1].a.text.replace('\n', '')
            except: name = None

            try: date = col[2].a.text.replace('\n', '')
            except: date = None

            try: temp = col[3].text.replace('\n', '')
            except: temp = None

            try: pulse = col[4].text.replace('\n', '')
            except: pulse = None

            try: time = col[5].text.replace('\n', '')
            except: time = None

            dic = {
                'Рейтинг': rating,
                'Имя': name,
                'Дата': date,
                'Темп': temp,
                'Пульс': pulse,
                'Время': time,
            }
            print(dic)
            lis.append(dic)

    return lis

def main():
    authorization()
 
    num_reting = pages_number(conf.URL_RATING)
    for num in range(1, int(num_reting)+1):
        data = get_table_data(num)
        print(data)
        # save_db(data)

the output in the console says that the error is 404, that's just why it's not clear

D:\www\starava_com\venv\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\\xE7\\xE8\\xEC\\xE0)' for column 'VARIABLE_VALUE' at row 484")
  result = self._query(query)
https://www.strava.com/segments/9926585/leaderboard?club_id=225082&filter=overall&page=1&per_page=25&partial=true
<Response [404]>
Traceback (most recent call last):
  File "D:/www/starava_com/parser.py", line 149, in <module>
    main()
  File "D:/www/starava_com/parser.py", line 143, in main
    data = get_table_data(num)
  File "D:/www/starava_com/parser.py", line 64, in get_table_data
    table = soup.find('table', {'class':'table table-striped table-padded table-leaderboard'}).find('tbody').find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find'

Process finished with exit code 1

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

P

Pavel Ivanov, 2020-05-27
@Ivanov_pv

Give at least a suggestive answer!