S
S
srnsdlmtn2016-02-29 18:35:25
Python
srnsdlmtn, 2016-02-29 18:35:25

How to scrape a website in python?

The topic may seem hackneyed, but I sincerely swear that I climbed Google in search of an answer, but did not find it. I
’ll immediately show my taxiback code:

import lxml.html
import requests

page = requests.get('https://rasp.nsuem.ru/group/2722').text
parser = lxml.html.fromstring(page)
subjectName = parser.cssselect('.mainScheduleInfo')
subjectTime = parser.cssselect('.time')
subjectDay = parser.cssselect('.day-header')
subjectRoom = parser.cssselect('.mainScheduleInfo a')

templateDay = 'День: {}'
for i in zip(subjectDay):
    print(templateDay.format(*[j.text for j in i]))

templateTime = 'Время: {}'
for i in zip(subjectTime):
    print(templateTime.format(*[j.text for j in i]))

templateName = 'Предмет: {}'
for i in zip(subjectName):
    print(templateName.format(*[j.text for j in i]))

templateRoom = 'Аудитория: {}'
for i in zip(subjectRoom):
    print(templateRoom.format(*[j.text for j in i]))

the link from where I take it is there, but I’ll throw it off again just in case: https://rasp.nsuem.ru/group/2722
in general, this code, for obvious reasons, displays the following construction for me:

Day: Mon
Day: Tue
Day: Wed
Day: Thu
Day: Fri
Day: Sat
Time: 11:25
Time: 13:20
Time: 15:05
Time: 16:50
Time: 11:25
Time: 13:20
Time: 15:05
Time: 16:50
Time: 18:30
Time: 15:05
Time: 16:50
Time: 8:00
Time: 9:40
Time: 15:05
Time: 16:50
Time: 18:30
Time: 13:20
Time: 15:05
Time: 16:50
Subject: Development and administration. Web Software,
Subject: Development and Administration Web Software,
Subject: Mobile Software Development,
Subject: Mobile Software Development,
Subject: Mobile Software Development,
Subject: Mobile Software Development,
Subject: Mobile Software Development,
Subject: Mobile Software Development,
Subject: Web Application Design,
Subject: Design Web Applications,
Subject: Web Application Design,
Subject: Web Application Design, Subject: Web Application Design,
Subject: Web Application
Security,
Subject: Web Application Security,
Subject: Web Application Security,
Subject : Multimedia technology,
Subject: Multimedia technology,
Subject: Multimedia Technology,
Subject: Multimedia Technology,
Subject: Development and Administration Web Software,
Subject: Web Application Design,
Subject: Development and Administration Web Software,
Subject: Web Application Design,
Subject: Development and Administration Web Software,
Subject: Development and Administration Web Software,
Subject: Development and Administration Web Software,
Subject: Mobile Software Development,
Audience: 5-716
Audience: 5-721
Audience: 5-716
Audience: 5-721
Audience: unknown.
Audience: 5-721
Audience: unknown
Audience: 5-721
Audience: 5-722
Audience: 5-722
Audience: 5-722
Audience: 5-722
Audience: unknown
Audience: 5-612
Audience: 5-612
Audience: 5-612
Audience: 5-707
Audience: 5-707
Audience: 5-713
Audience: 5-713
Audience: 5-712
Audience: 5-722
Audience: 5-712
Audience: 5-722
Audience: unknown
Audience: 5-712
Audience: 5-713
Audience: 5-712

I need to combine all this, I sort of understand that I can add them all up, make the right template and they will add up, but the schedule structure will not be saved, how can I implement this in such a way that the day of the week, the audience and the name of the subject are the same OK, what's on the site?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vladimir Olohtonov, 2016-02-29
@srnsdlmtn

Try to parse the table "schedule_table" line by line, it seems to me that this will be the easiest way
(iterate over the "tr" tags).
Here is an example stackoverflow.com/a/9920703

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question