How to use python to login to a university resource site?

vitom2012-06-06 20:28:38

Python

vitom, 2012-06-06 20:28:38

We were shown on one subject how to create a web-crawler in python. And I want to create a moodle news alert. I don’t know if this is used in Russia, but at my University of Barcelona, yes. I'm trying to download the html of the site, find the section where teachers give grades, pdf materials ... and when something new is found, notify me. Simple and elementary. But the problem is that the site requires authentication. At first glance, only submit form using the http protocol. But in fact, authentication is performed through this place via https.

<form action="https://auten.ub.edu/uauten.pl" method="post" name="login" id="login">

How to login and download the course page campusvirtual.ub.edu/course/view.php?id=34437

Answer the question

In order to leave comments, you need to log in

6 answer(s)

mik_os, 2012-06-06
@vitom

import urllib2
from urllib import urlencode
from cookielib import CookieJar

cookie_processor = urllib2.HTTPCookieProcessor(CookieJar())
opener = urllib2.build_opener(cookie_processor)

auth_data = {
# см. в firebug/еще где-то
}
opener.open('https://auten.ub.edu/uauten.pl', urlencode(auth_data))

and then use the same openerto navigate the site.

Michael, 2012-06-06
@1099511627776

Well, you should probably start from here: docs.python.org/library/httplib.html
and specifically from an example As params - pass login \ password and other parameters (you can peep in Firebug) and then look at the return for cookies if suddenly there they are used to organize a session, and then shove them (cookies) into each request to the site
>>> import httplib, urllib >>> params = urllib.urlencode({'@number': 12524, 'type': 'issue', 'action': 'show'}) >>> headers = {"Content-type": "application/x-www-form-urlencoded", ... "Accept": "text/plain"} >>> conn = httplib.HTTPConnection("bugs.python.org") >>> conn.request("POST", "", params, headers) >>> response = conn.getresponse() >>> print response.status, response.reason 302 Found >>> data = response.read() >>> data 'Redirecting to http://bugs.python.org/issue12524' >>> conn.close()

mik_os, 2012-06-06
@mik_os

Have a look in firebug at last. There's also a bunch of inputs,

vitom, 2012-06-06
@vitom

O! Hooray, it worked. Thanks for your help and patience. I didn't know that all input's should be sent

mgSergio, 2012-06-07
@mgSergio

Instead of urllib in such a task, it is much more pleasant to use the Python non-standard requests library.

vitom, 2012-08-12
@vitom

What did I miss here?

cookie_processor = urllib2.HTTPCookieProcessor(CookieJar())
opener = urllib2.build_opener(cookie_processor)
auth_data = {
    'login': '*****',
    'password': '******'

}
 
opener.open('https://feinaactiva.gencat.cat/web/guest/candidatelogin?p_p_id=loginCandidate&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=1&_loginCandidate_struts_action=%2FloginCandidate%2Fauthentication', urlencode(auth_data))
req = opener.open('https://feinaactiva.gencat.cat/group/candidate/jobslocator?p_p_id=jobsLocator_WAR_psocwebjobslocator&p_p_lifecycle=1&p_p_state=maximized&p_p_mode=view&_jobsLocator_WAR_psocwebjobslocator_struts_action=%2Fjobslocator%2FjobsLocator&saveLastPath=0&_jobsLocator_WAR_psocwebjobslocator_forwardPath=search')
html = req.read()

Returns the login page. It seems that there are no more inputs. What's the problem?