You need to get the link of the site after the request, but how?

M

m1kz2020-05-31 22:54:38

Python

m1kz, 2020-05-31 22:54:38

There is a website with any series (this example is https://rezka.ag/series/thriller/9364-mister-robot... ). It is necessary to write a code in python that presses a button on the site or sends a request that changes the series, and then find out the link of the resulting site. But I don't know how to click on buttons or make requests for it.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

soremix, 2020-06-01
@m1kz

If the request is not very complex, then you can do this:
F12 in the browser -> network
Press the button on the site, look at the first request in the list and what it returned. If you returned html, you can parse a new link from it. If a redirect, then you need to take a new link from the Location header.
If the request is complex, let's say authorization is needed, or it is difficult to do it through js, then you can use Selenium. Everything is simple there. To prevent the browser from being an eyesore, you can set the headless setting.
This is in general terms. If no one does it by tomorrow, I can help in the morning.
UPD:

import requests
import time
import json
import re



BASE_URL = 'https://rezka.ag/ajax/get_cdn_series/?t={}'



def parse_quality(urls, quality=None):

    splited = urls.split(',')

    if not quality:
        using_quality = splited[-1].split('http')[0]
        print('Используем максимально доступное качество ({})'.format(using_quality))
        return splited[-1].split(' or')[0].replace(using_quality, '')

    intext_quality = '[{}p]'.format(quality)

    if intext_quality not in urls:
        print('Качества {} нет в списке доступных'.format(quality))
        
        # тут уже я не выдержал и импортировал регекс
        available_qualities = re.findall(r'\[(.+?)\]', urls)
        print('Доступные варианты: ', ', '.join(available_qualities))
        return None

    for url in splited:
        if intext_quality in url:
            return url.split(' or')[0].replace(intext_quality, '')


def get_urls(film_id, season, episode):

    payload = {'id': film_id, 'translator_id': '1', 'season': season, 'episode': episode, 'action': 'get_stream'}

    r = requests.post(BASE_URL.format(str(int(time.time()))), data=payload)

    if r.status_code != 200:
        # тут нужно будет как нибудь обработать ошибку, если запрос не прошел
        print('Ошибка')
        return

    data = json.loads(r.text)

    if data.get('success') != True:
        # тут нужно будет как нибудь обработать ошибку, если запрос не прошел
        print('Ошибка')
        print(data)
        return

    return data['url']



if __name__ == '__main__':
    
    all_urls = get_urls(9364, 1, 1)

    if all_urls:
        url = parse_quality(all_urls)

        print(url)

Parameters for get_urls:
movie ID, can be taken from the link in the browser
Season, episode - everything is clear here
Parameters for parse_quality:
list of links with quality
desired quality. For example, parse_quality(all_urls, 1080) If the quality
is set, then it will return a link to it, if it does not find it - an error . indicated 720p. There is no error in this, the site marks 1080p like that, it seems that there is no honest 1080 there)

Raw answer with links if interested

[360p]https://load.hdrezka-ag.net/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/aff285f2cedd0cb70b49a97e53b8c246:2020060411/240.mp4:hls:manifest.m3u8 or https://load.hdrezka-ag.net/65af9f4fab0d894043fac8887b7da99e:2020060411/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/240.mp4,[480p]https://load.hdrezka-ag.net/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/aff285f2cedd0cb70b49a97e53b8c246:2020060411/360.mp4:hls:manifest.m3u8 or https://load.hdrezka-ag.net/ea218fab2e907aa2093c5bc7f9cb480d:2020060411/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/360.mp4,[720p]https://load.hdrezka-ag.net/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/aff285f2cedd0cb70b49a97e53b8c246:2020060411/480.mp4:hls:manifest.m3u8 or https://load.hdrezka-ag.net/92bdeccddc5661b6b786659fae6adc3b:2020060411/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/480.mp4,[1080p]https://load.hdrezka-ag.net/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/aff285f2cedd0cb70b49a97e53b8c246:2020060411/720.mp4:hls:manifest.m3u8 or https://load.hdrezka-ag.net/d2bb808ccb6910d8317224825ee2875d:2020060411/tvseries/cb2beeb8822647baa8621766e5a360cc3c7ae16b/720.mp4

M

m1kz, 2020-06-01
@m1kz

At random, I tried many different codes, from which I got what I wanted. For those interested, here is a little code:

import requests
import json


def get_url(id, t_id, season, episode):
    '''Возвращает url видео '''
    URL = 'https://hdrezka.sh/ajax/get_cdn_series/?t=1590958856022' #const для запросов
    dict = {
             'id': id,
             'translator_id': t_id,  # озвучка
             'season': season,
             'episode': episode,
             'action': 'get_stream' #const
            }

    response = requests.post(URL, data=dict) #сам запрос

    dict = json.loads(response.text.replace("'",'"')) #Строка в словарь
    i = -1
    while dict['url'][i]!=' ':
        i-=1
    return dict['url'][i:] #Видео с лучшим качеством


Naruto = [12333, 14, 2, 38]
Mr_Robot = [9364, 1, 1, 1]

print('%d серия Наруто %d сезона : %s \n' % (Naruto[3], Naruto[2],get_url((*Naruto))))
print('%d серия Мистера Робота %d сезона : %s \n' % (Mr_Robot[3], Mr_Robot[2],  get_url((*Mr_Robot))))

Thanks to all