S
S
srnsdlmtn2016-03-04 20:21:58
Python
srnsdlmtn, 2016-03-04 20:21:58

Is it possible to parse a table by columns in Python?

here asked a question about site parsing How to parse a site in python?
I managed to parse, but here's the thing:

import lxml.html
import requests

page = requests.get('https://rasp.nsuem.ru/group/2722').text
parser = lxml.html.fromstring(page)

section = parser.cssselect('.info.centered')
time = parser.cssselect('.time')
name = parser.cssselect('.mainScheduleInfo')

template = 'Name: {}, Time: {}, Name: {}'
for i in zip(section, time, name):
    print(template.format(*[j.text for j in i]))

This code produces something like this:

Name:
, Time: 11:25, Name: Development & Admin. Web Software,
Name: Mon, Time: 13:20, Name: Development & Admin. Web Software,
Name: None, Time: 15:05, Name: Mobile Software Development,
Name: None, Time: 16:50, Name: Mobile Software Development,
Name: None, Time: 11:25, Name: Mobile Software Development,
Name: None, Time: 13:20, Name: Mobile Software Development,
Name: Tue, Time: 15:05, Name: Mobile Software Development,
Name: None, Time: 16:50, Name: Mobile Software Development,
Name: None, Time: 18:30, Name: Web Application Design,
Name: None, Time: 15:05, Name: Web Application Design,
Name: None, Time: 16:50, Name: Web Application Design,
Name: None, Time: 8:00, Name: Web Application Design,
Name: Wed, Time: 9:40, Name: Web Application Design ,
Name: None, Time: 15:05, Name: Web Application Security,
Name: None, Time: 16:50, Name: Web Application Security,
Name: Thu, Time: 18:30, Name: Security Web Application Security,
Name: None, Time: 13:20, Name: Multimedia Technology,
Name: None, Time: 15:05, Name: Multimedia Technology,
Name: Fri, Time: 16:50, Name: Multimedia Technology,
Name: None, Time: 15:05, Name: Multimedia Technology,
Name: None, Time: 16:50, Name: Development & Admin. web software,
Name: None, Time: 18:30, Name: Web Application Design,
Name: Sat, Time: 13:20, Name: Development & Administration Web Software,
Name: None, Time: 15:05, Name: Web Application Design,
Name: None, Time: 16:50, Name: Development and Administration Web Software,
Name: None, Time: 16:50, Name: Development & Administration web software,

This seems to be normal and what you need, but the problem is that the schedule on the site https://rasp.nsuem.ru/group/2722 is grouped by columns the first week and the second week, and everything is displayed line by line, how can you implement the output by columns? preferably separately the first separately the second

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2016-03-04
@srnsdlmtn

xpath all right? ))

f = open(name, 'r')
            text = f.read()
            ev = html.document_fromstring(text)

            item = {}
            item['s']=[]

            itemdata = ev.xpath('//li[@class="itr"]')

            for evv in itemdata:
                itm={}
                itm['l'] =  evv.xpath('./a[@class="js"]/@href')
                itm['txt'] =  evv.xpath('./a[@class="js"]/text()')

                if itm:
                    item['s'].append(itm)

            if item['s']:
                coll.save(item)

here is the algorithm that I described to you last time:
first you pull out the block, then you go inside the block

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question