Answer the question
In order to leave comments, you need to log in
How to use python to login to a university resource site?
We were shown on one subject how to create a web-crawler in python. And I want to create a moodle news alert. I don’t know if this is used in Russia, but at my University of Barcelona, yes. I'm trying to download the html of the site, find the section where teachers give grades, pdf materials ... and when something new is found, notify me. Simple and elementary. But the problem is that the site requires authentication. At first glance, only submit form using the http protocol. But in fact, authentication is performed through this place via https.
<form action="https://auten.ub.edu/uauten.pl" method="post" name="login" id="login">
Answer the question
In order to leave comments, you need to log in
import urllib2
from urllib import urlencode
from cookielib import CookieJar
cookie_processor = urllib2.HTTPCookieProcessor(CookieJar())
opener = urllib2.build_opener(cookie_processor)
auth_data = {
# см. в firebug/еще где-то
}
opener.open('https://auten.ub.edu/uauten.pl', urlencode(auth_data))
opener
to navigate the site.
Well, you should probably start from here: docs.python.org/library/httplib.html
and specifically from an example
As params - pass login \ password and other parameters (you can peep in Firebug) and then look at the return for cookies if suddenly there they are used to organize a session, and then shove them (cookies) into each request to the site
>>> import httplib, urllib
>>> params = urllib.urlencode({'@number': 12524, 'type': 'issue', 'action': 'show'})
>>> headers = {"Content-type": "application/x-www-form-urlencoded",
... "Accept": "text/plain"}
>>> conn = httplib.HTTPConnection("bugs.python.org")
>>> conn.request("POST", "", params, headers)
>>> response = conn.getresponse()
>>> print response.status, response.reason
302 Found
>>> data = response.read()
>>> data
'Redirecting to http://bugs.python.org/issue12524'
>>> conn.close()
O! Hooray, it worked. Thanks for your help and patience. I didn't know that all input's should be sent
Instead of urllib in such a task, it is much more pleasant to use the Python non-standard requests library.
What did I miss here?
cookie_processor = urllib2.HTTPCookieProcessor(CookieJar())
opener = urllib2.build_opener(cookie_processor)
auth_data = {
'login': '*****',
'password': '******'
}
opener.open('https://feinaactiva.gencat.cat/web/guest/candidatelogin?p_p_id=loginCandidate&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=1&_loginCandidate_struts_action=%2FloginCandidate%2Fauthentication', urlencode(auth_data))
req = opener.open('https://feinaactiva.gencat.cat/group/candidate/jobslocator?p_p_id=jobsLocator_WAR_psocwebjobslocator&p_p_lifecycle=1&p_p_state=maximized&p_p_mode=view&_jobsLocator_WAR_psocwebjobslocator_struts_action=%2Fjobslocator%2FjobsLocator&saveLastPath=0&_jobsLocator_WAR_psocwebjobslocator_forwardPath=search')
html = req.read()
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question