I can't get information from the script tag, what should I do?

V

Vladimir Vladimirovich2020-07-15 10:47:54

Python

Vladimir Vladimirovich, 2020-07-15 10:47:54

Hello!
I want to extract the links of all images from a link, but I have a problem extracting information from the script tag
. The code is here:

import requests
import json
from bs4 import BeautifulSoup as BS
import re

# Ссылка на полную страницу
url = 'https://www.instagram.com/p/B5n2EXjF_1C/'

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}


r = requests.get(url, headers=headers, allow_redirects=True)

soup = BS(r.content, 'html.parser')
script = soup.find('script', attrs={'type':"text/javascript"}, text=re.compile('window._sharedData'))
data = json.loads(script.next)
image_url = data['display_url']

print(image_url)

There is also an error code:
Traceback (most recent call last):
File "C:/Users/Desktop/test/venv/Test.py", line 19, in
data = json.loads(script.next)
File "C:\ Users\AppData\Local\Programs\Python\Python38-32\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "C:\Users\AppData\Local\Programs\Python\ Python38-32\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\AppData\ Local\Programs\Python\Python38-32\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value:line 1 column 1 (char 0)
I am not very good at working with js types, but if someone has the desire and time to help, I will be very grateful to you!

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Stalker_RED, 2020-07-15
@Stalker_RED

You are trying to parse it as JSON, but there is no JSON. There's javascript.