Tag identification and using the lxml parsing library for Python. Is it possible to?

F

Filat Astakhov2016-05-16 12:41:59

Python

Filat Astakhov, 2016-05-16 12:41:59

Good day,
I am writing a small code for processing points in fantasy football. For this I use Python 2.7 32bit with lxml 3.6.0.
Previously used the same library to process movie data, worked like clockwork.
The problem is that it is not possible to read data from a tag and with a certain class.
Here is a part of the html code, in order to understand the structure:

<div class="grace full-field">
<div class="forward-container"><ins data-id="1744589" data-amplua="4" class="player hold player-base ">
  <img class="t-shirt" src="http://www.sports.ru/storage/img/fantasy/shirts/rfpl/spartak.png" alt="Спартак" title="Спартак"><span class="name">Зе Луиш</span>
  <span class="pl-descr">
    <i class="ico info2" data-id="1744589"></i><i class="ico point">-</i>
  </span>

The Python page itself :
from urllib2 import urlopen from lxml import html url = urlopen('http://www.sports.ru/fantasy/football/team/points/1443463.html') page = html.parse(url) points = page.getroot().find_class('ico point') print points for i in points: print i.text_content()

The parser finds the "forward-container" class, but doesn't want to go any further. That is, the classes "name" and the "ico point" I need are not found.
Tried via .xpath():
names = page.xpath('.//i[contains(@class, "ico point")]')

But nothing happened.
There are several questions:
1. Is lxml unable to identify classes for tags , ?
2. Or is it a bug in my code?
3. And are there any parsers that can find the necessary classes in these tags?
4. Or do you have to write the parser yourself?
I'm sorry if my questions sound ridiculous as I'm just learning.
Thanks in advance,

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

sim3x, 2016-05-16
@sim3x

stackoverflow.com/questions/3881044/how-to-get-htm...

//div[contains(@class, 'class1') and contains(@class, 'class2')]