Answer the question
In order to leave comments, you need to log in
Why does bs4 parse the page incorrectly?
Good day, I'm trying to parse a VKontakte avatar, I took Pavel Durov as an example, part of the code is as follows:
import bs4
import requests
def getting_avatar(id):
request = requests.get(" https://vk.com/id " + id)
b = bs4.BeautifulSoup(request.text, "html.parser")
print(b)
getting_avatar(1)
The problem is that the page at Pavel Durov | VKontakte contains about 2500 lines, among which is just the right tag with the required id = profile_photo_link, and the result ...
Answer the question
In order to leave comments, you need to log in
The question is solved, you can do it like this:
import urllib
from selenium import webdriver
import random
import urllib.request
url = input()
driver = webdriver.Chrome()
driver.get(url)
with open('filename.png', 'wb' ) as file:
file.write(driver.find_element_by_xpath('//*[@id="profile_photo_link"]/img').screenshot_as_png)
driver.close()
import requests
from bs4 import BeautifulSoup
import json
response = requests.get('https://vk.com/id1')
soup = BeautifulSoup(response.text,'html.parser')
avatar = soup.find('div',id='page_avatar').a.get('onclick')
json_raw = avatar[avatar.find('{'):avatar.rfind('}')+1] #Здесь вытаскивает json
json_data = json.loads(json_raw)
print(json_data['temp']['x']) # Получаем из json url аватарки
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question