How does SaveFrom work?

S

Sergey Yavin2020-06-23 17:31:35

Python

Sergey Yavin, 2020-06-23 17:31:35

Hello. I would like to learn and understand in detail how to write a program in python that would download any video from any site. This is not a project! I would like to understand how SaveFrom works for example. Point to materials for study and I would like to parse it straight from the basics.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

soremix, 2020-06-23
@sjaserds

You will not make a universal parser, each site gives content differently.
You need to make a request to the video page and try to find a direct link to the video in the source code and download from it.
Everything looks like this:
0. If the site has an API, it would be better to use it. If the API is available, the rest of the steps are not needed (of course, provided that you can get a direct link to the video)
1. Open the page with the video

Tiktok for example

https://www.tiktok.com/@golden_men_6/video/6803237...

2. Through ctrl+U you are trying to find a direct link to the video. Usually it is in the code. If not, you are trying to understand how the video is loaded on the page, everywhere in different ways. In the example with tiktok, it is here: it is
better to open the picture in full screen

direct link

3. When you find a link, check if you need it. Next, you need to parse it via bs4/json/re from the page.
4. Next, you can already make a request for it using the same requests and save it to a file
An example for tiktok

import requests
import json
from bs4 import BeautifulSoup

# Ссылка на полную страницу
tiktok_url = 'https://www.tiktok.com/@golden_men_6/video/6803237854542712070'

# Чтобы сайт правильно открылся
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}


# Запрос на страницу с видео
r = requests.get(tiktok_url, headers=headers, allow_redirects=True)

# парсинг страницы, ссылка на видео лежит в скрипте с id videoObject, который является JSON
soup = BeautifulSoup(r.text, 'html.parser')
script = soup.find('script', attrs={'id': 'videoObject'})
data = json.loads(script.text)
video_url = data['contentUrl']

# запрос на прямую ссылку с видео
r = requests.get(video_url)

#запись
with open('video.mp4', 'wb') as f:
    f.write(r.content)