P
P
Pogromist Pogromist2014-11-02 15:46:27
Python
Pogromist Pogromist, 2014-11-02 15:46:27

How to parse src value from html code?

There is a VK page vk.com/kostya__wolf?z=photo107790602_343297825%2Fa...
you need to parse the src value from a piece of html code

<img style="width: 803px; height: 565px; margin-top: 0px;" src="http://cs618331.vk.me/v618331602/10618/Ge2uPaxB4B0.jpg">

Tell me how to do this? I need the src value as a variable

Answer the question

In order to leave comments, you need to log in

2 answer(s)
T
throughtheether, 2014-11-02
@throughtheether

With selenium:

from selenium import webdriver
url='https://vk.com/kostya__wolf?z=photo107790602_343297825%2Falbum107790602_00%2Frev'
xpath='//a[@id="pv_open_original"]'
browser = webdriver.Firefox()
browser.get(url)
print browser.find_element_by_xpath(xpath).get_attribute('href')
browser.quit()

With requests and lxml:
import requests
import json
import lxml.html

url='https://vk.com/kostya__wolf?z=photo107790602_343297825%2Falbum107790602_00'
r=requests.get(url)
doc=lxml.html.fromstring(r.text)
search_string=url[url.find('photo'):url.find('%2F')]
xpath='//a[contains(@href, "%s")]' % search_string
src=doc.xpath(xpath)[0].get('onclick')
d = json.loads(src[src.find('{'):src.find('}}')+len('}}')])
src=d['temp']['base']+d['temp']['z_'][0]+'.jpg'
print src

W
wiygn, 2014-11-02
@wiygn

This is the second question on parsing VK pages, which could be solved through the API. Do you really need it? If not, then https://vk.com/dev/methods

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question