D
D
DarkWood2017-03-17 12:11:27
Python
DarkWood, 2017-03-17 12:11:27

How to log in with requests in a PHP forum?

Hello.
I want to parse the demiart.ru forum to automate certain tasks. You need to be logged in to view topics. I have my username and password, of course. I'm trying to log in using this tutorial: https://kazuar.github.io/scraping-tutorial/
Actually, my code is taken from there with minimal changes:

import requests
from lxml import html

LOGIN_URL = "http://demiart.ru/forum/index.php?"
URL = "http://demiart.ru/forum/index.php?showtopic=8436"

session_requests = requests.session()

payload = {
    "UserName": USERNAME, 
    "PassWord": PASSWORD, 
    "submit": 'Войти',
}

result = session_requests.post(LOGIN_URL, data=payload, headers=dict(referer=LOGIN_URL))

result = session_requests.get(URL, headers=dict(referer=URL))
tree = html.fromstring(result.content)
theme_title = tree.xpath(".//div[@class='f_break tablefixed']")

print(theme_title)

The forum does not have an explicit login page - this can be done anywhere, so I indicated the main one as such. csrf_token mentioned in the tutorial is missing here. For example, I want to pull out at least the name of any topic (xpath has already been checked).
In response, I get the previous page without authorization. In fact, I can’t even get a tree of elements (I print tree and see only <Element html at 0x3b3a188>).
In this case, quite a novice. Perhaps you need more data (eg cookies)? Or do you need to do it some other way?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2017-03-18
@dimonchik2013

1) use pycurl and only pycurl - faster, easier, multi-threaded
2) use www.telerik.com/fiddler for headers and understanding what is being transmitted
I think in your case it is enough to specify saving cookies and everything will work out, but - see p1

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question