S
S
seawolf082019-03-10 22:12:46
Python
seawolf08, 2019-03-10 22:12:46

How to find an element in the xml code of the VK page (the xpath () method in Python does not find, or the xpath path is incorrectly specified)?

The task is to parse and download all profile photos using the link to the VK page.
1) Link: https://vk.com/id1
2) The program goes to profile photos: https://vk.com/id1?z=albums1
3) Next, it parses each photo. For example, you need this one: https://vk.com/id1?z=photo1_456315566%2Fphotos1

import requests
from lxml import html

URL_page = 'https://vk.com/id1'
VK = "https://vk.com"

#getting the page's tree
page = requests.get(URL_page)
tree = html.fromstring(page.content)

#going to photos & getting new page tree
!!!
ищу этот элемент:
<a href="/albums1" onclick="return nav.change({z: 'albums1'}, event)" class="module_header">
...
</a>
!!!
photos_of_user = tree.xpath("//*[@id=\"profile_photos_module\"]/a")
photos_of_user_url = photos_of_user[0].attrib['href']

URL_page = VK + photos_of_user_url
#непосредственно переход
page = requests.get(URL_page)
tree = html.fromstring(page.content)

#searching of photos & creating it's urls
photos = [x.attrib['href'] for x in tree.xpath("//*[@id=\"photo_row_1_456315566\"]/a")]
photos_url = [(VK + x) for x in photos]

So, the first list (photos_of_user), and therefore all the last ones, turn out to be empty.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dmitry, 2019-03-11
@LazyTalent

The problem is not with xpath, but with what the site gives.
Do print(page.text)it and see what you actually get from the site.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question