How to parse a table with python regular expressions?

K

kennnies2019-07-03 20:03:11

Python

kennnies, 2019-07-03 20:03:11

There is the following table:

HTML table

<tr>
          <td>99</td>
          <td>Name</td>
          <td>ЕГЭ</td>
          <td>268</td><td>90</td><td>91</td><td>87</td>
          <td></td>
          <td>Копия</td>
          <td>Нет</td>
        </tr>

I use the following regular expression to parse numbers:

re.findall(r'\d{3,3}\d{1,3}\d{1,3}\d{1,3}

You also need to parse the "Copy" field, the transition to a new line does not allow this, I tried it through

\s \n \t \r and \s

It didn't work out very well, how can this be done?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vladimir Kuts, 2019-07-03
@fox_12

Well, as it were, regular expressions are far from the most suitable tool for this.

>>> import lxml.html
>>> str1 = """
... <tr>
...           <td>99</td>
...           <td>Name</td>
...           <td>ЕГЭ</td>
...           <td>268</td><td>90</td><td>91</td><td>87</td>
...           <td></td>
...           <td>Копия</td>
...           <td>Нет</td>
...         </tr>"""
>>> root = lxml.html.fromstring(str1)
>>> [x.text for x in root.xpath('.//td')]
['99', 'Name', 'ЕГЭ', '268', '90', '91', '87', None, 'Копия', 'Нет']

D

Dimonchik, 2019-07-03
@dimonchik2013

pytablereader