A
A
Artyom Nikiforov2020-07-05 08:03:21
Python
Artyom Nikiforov, 2020-07-05 08:03:21

How to correctly compose a request for a site with the correct encoding?

I am writing a simple program for selecting synonyms for a given word.
She accesses the site, then parsing should take place, etc.
Here is the code for now:

import requests

word = input('Введите слово для поиска синонимов: ')
url = 'http://trishin.net'
data = {"searchText": url, "mode": "leftPart"}
response = requests.get(url, data=data)
print(response.text)

How do I change the encoding of my request parameter?
Here are the correct parameters taken by the sniffer from the site: searchText=%E1%F0%E0%F2%E0%ED&mode=leftPart
Here is what my script sends: searchText=%D0%B1%D1%80%D0%B0%D1% 82%D0%B0%D0%BD&mode=leftPart

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
soremix, 2020-07-05
@teacoder

This is a cp1251 encoded url string. So first we encode it to cp1251 via .encode('cp1251') , then we do urlencoding via urllib.parse.quote_plus()
Then. The request there is not get , but post , so we put it. But the site did not want to return anything, it needs more information. Added cookies + some headers. For some reason, instead of word, you added a url to the formdata dictionary, so let's fix it to word. Well, I transformed the entire formdata into a slightly different format

import requests
import urllib.parse


word = urllib.parse.quote_plus(input('Введите слово для поиска синонимов: ').encode('cp1251'))
url = 'http://trishin.net/'


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
            'Referer': 'http://trishin.net/',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'Content-Type': 'application/x-www-form-urlencoded'}

s = requests.Session()
s.get('http://trishin.net')

payload = 'mode=leftPart&searchText={}&left=456511&right=2726'.format(word)
response = s.post(url, data=payload, headers=headers)

print(response.text)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question