Answer the question
In order to leave comments, you need to log in
Tag identification and using the lxml parsing library for Python. Is it possible to?
Good day,
I am writing a small code for processing points in fantasy football. For this I use Python 2.7 32bit with lxml 3.6.0.
Previously used the same library to process movie data, worked like clockwork.
The problem is that it is not possible to read data from a tag and with a certain class.
Here is a part of the html code, in order to understand the structure:<div class="grace full-field">
<div class="forward-container"><ins data-id="1744589" data-amplua="4" class="player hold player-base ">
<img class="t-shirt" src="http://www.sports.ru/storage/img/fantasy/shirts/rfpl/spartak.png" alt="Спартак" title="Спартак"><span class="name">Зе Луиш</span>
<span class="pl-descr">
<i class="ico info2" data-id="1744589"></i><i class="ico point">-</i>
</span>
The Python page itself :from urllib2 import urlopen
from lxml import html
url = urlopen('http://www.sports.ru/fantasy/football/team/points/1443463.html')
page = html.parse(url)
points = page.getroot().find_class('ico point')
print points
for i in points:
print i.text_content()
The parser finds the "forward-container" class, but doesn't want to go any further. That is, the classes "name" and the "ico point" I need are not found.
Tried via .xpath():names = page.xpath('.//i[contains(@class, "ico point")]')
But nothing happened.
There are several questions:
1. Is lxml unable to identify classes for tags , ?
2. Or is it a bug in my code?
3. And are there any parsers that can find the necessary classes in these tags?
4. Or do you have to write the parser yourself?
I'm sorry if my questions sound ridiculous as I'm just learning.
Thanks in advance,
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question