How to find an element in the xml code of the VK page (the xpath () method in Python does not find, or the xpath path is incorrectly specified)?

S

seawolf082019-03-10 22:12:46

Python

seawolf08, 2019-03-10 22:12:46

The task is to parse and download all profile photos using the link to the VK page.
1) Link: https://vk.com/id1
2) The program goes to profile photos: https://vk.com/id1?z=albums1
3) Next, it parses each photo. For example, you need this one: https://vk.com/id1?z=photo1_456315566%2Fphotos1

import requests
from lxml import html

URL_page = 'https://vk.com/id1'
VK = "https://vk.com"

#getting the page's tree
page = requests.get(URL_page)
tree = html.fromstring(page.content)

#going to photos & getting new page tree
!!!
ищу этот элемент:
<a href="/albums1" onclick="return nav.change({z: 'albums1'}, event)" class="module_header">
...
</a>
!!!
photos_of_user = tree.xpath("//*[@id=\"profile_photos_module\"]/a")
photos_of_user_url = photos_of_user[0].attrib['href']

URL_page = VK + photos_of_user_url
#непосредственно переход
page = requests.get(URL_page)
tree = html.fromstring(page.content)

#searching of photos & creating it's urls
photos = [x.attrib['href'] for x in tree.xpath("//*[@id=\"photo_row_1_456315566\"]/a")]
photos_url = [(VK + x) for x in photos]

So, the first list (photos_of_user), and therefore all the last ones, turn out to be empty.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dmitry, 2019-03-11
@LazyTalent

The problem is not with xpath, but with what the site gives.
Do print(page.text)it and see what you actually get from the site.