How to log in with requests in a PHP forum?

D

DarkWood2017-03-17 12:11:27

Python

DarkWood, 2017-03-17 12:11:27

Hello.
I want to parse the demiart.ru forum to automate certain tasks. You need to be logged in to view topics. I have my username and password, of course. I'm trying to log in using this tutorial: https://kazuar.github.io/scraping-tutorial/
Actually, my code is taken from there with minimal changes:

import requests
from lxml import html

LOGIN_URL = "http://demiart.ru/forum/index.php?"
URL = "http://demiart.ru/forum/index.php?showtopic=8436"

session_requests = requests.session()

payload = {
    "UserName": USERNAME, 
    "PassWord": PASSWORD, 
    "submit": 'Войти',
}

result = session_requests.post(LOGIN_URL, data=payload, headers=dict(referer=LOGIN_URL))

result = session_requests.get(URL, headers=dict(referer=URL))
tree = html.fromstring(result.content)
theme_title = tree.xpath(".//div[@class='f_break tablefixed']")

print(theme_title)

The forum does not have an explicit login page - this can be done anywhere, so I indicated the main one as such. csrf_token mentioned in the tutorial is missing here. For example, I want to pull out at least the name of any topic (xpath has already been checked).
In response, I get the previous page without authorization. In fact, I can’t even get a tree of elements (I print tree and see only <Element html at 0x3b3a188>).
In this case, quite a novice. Perhaps you need more data (eg cookies)? Or do you need to do it some other way?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2017-03-18
@dimonchik2013

1) use pycurl and only pycurl - faster, easier, multi-threaded
2) use www.telerik.com/fiddler for headers and understanding what is being transmitted
I think in your case it is enough to specify saving cookies and everything will work out, but - see p1