How to use Selenium to grab only some elements from a tag?

V

Vasily Nikonov2020-11-07 22:46:50

Python

Vasily Nikonov, 2020-11-07 22:46:50

There is a certain page on which there is a certain table in which the information I need is presented approximately as follows:

<tbody>
   <tr>
      <td>День</td>
      <td>Время</td>
      <td>Кабинет</td>
      <td>Преподаватель</td>
      <td>Дисциплина</td>
      <td>Вид</td>
      <td>Ссылка</td>
   </tr>
   <tr>
      <td>День</td>
      <td>Время</td>
      <td>Кабинет</td>
      <td>Преподаватель</td>
      <td>Дисциплина</td>
      <td>Вид</td>
      <td>Ссылка</td>
   </tr>

   . . .

</tbody>

Using Selenium (I need it exclusively for a number of reasons), I want to take only some elements from this table (for example, time, teacher, discipline, link) and then put it all in an array.
THOSE. After Selenium reaches this table with these elements, I get something like the following output:

array = [["Время", "Преподаватель", "Дисциплина", "Ссылка"],
         ["Время", "Преподаватель", "Дисциплина", "Ссылка"],
         . . .]

PS I only need a small piece of code, so please don't describe how to open the browser, how to get to the desired site, etc. I have a path to this table thanks to XPath. I only need a method (or something similar) to save only part of the information. Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)

V

Vladimir Kuts, 2020-11-07
@fox_12

Well, just specify the correct xpath:

import io
from lxml import etree
parser = etree.HTMLParser()

data = """<tbody>
   <tr>
      <td>День</td>
      <td>Время</td>
      <td>Кабинет</td>
      <td>Преподаватель</td>
      <td>Дисциплина</td>
      <td>Вид</td>
      <td>Ссылка</td>
   </tr>
   <tr>
      <td>День</td>
      <td>Время</td>
      <td>Кабинет</td>
      <td>Преподаватель</td>
      <td>Дисциплина</td>
      <td>Вид</td>
      <td>Ссылка</td>
   </tr>
</tbody>"""


root = etree.parse(io.StringIO(data), parser=parser)

[[x.xpath('.//td[2]')[0].text, x.xpath('.//td[4]')[0].text] for x in root.xpath('.//tr')]
# [['Время', 'Преподаватель'], ['Время', 'Преподаватель']]