R
R
RealL_HarDCorE2015-05-06 23:13:24
PHP
RealL_HarDCorE, 2015-05-06 23:13:24

How to set correct encoding for response from server in Python 3?

Hello. I am writing software in Python 3. I need to make a request to the server so that it returns regular JSON. I work with UTF-8 encoding. This encoding is specified everywhere: in the server config, in the header of the PHP script, in the Python script (# -*- coding: utf-8 -*-), the script files themselves are also in UTF-8. But when trying to decode the received response from UTF-8, an error appears:

[Decode error - output not utf-8]

And if you decode the answer with cp1251 encoding, then everything works fine. I'm wondering where this encoding came from?
Python code:
from urllib.request import Request, urlopen
from urllib.parse import urlencode
import json

req = Request("http://devcave.ru/json.php")
response = urlopen(req)
data = response.read().decode('cp1251') # .decode('utf-8') вызывает ошибку, описанную выше 
data = json.loads(data)

print(response.headers.get_content_charset())
print(data)

Python code output:
utf-8
{'key': 'Russian language'}

PHP code on the server:
header('Content-Type: application/json;charset=utf-8');
echo json_encode(array('key' => 'русский язык'), JSON_UNESCAPED_UNICODE);

Resolved:
In fact, when decoding a string, instead of cp1251, it was possible and necessary to use utf-8 - while everything goes without errors, which means that all encodings are configured correctly everywhere.
The problem was in the print function, which refused to print the decoded string and became the initiator of the error.
Implications of moving from Python 2 to 3.
Thanks to everyone who tried to help)

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Andrey Dugin, 2015-05-06
@ReaL_HarDCorE

In Python 2.7 it works like this :)

>>> data = urlopen('http://devcave.ru/json.php').read()
>>> data.decode('utf-8')
u'{"key":"\\u0440\\u0443\\u0441\\u0441\\u043a\\u0438\\u0439 \\u044f\\u0437\\u044b\\u043a"}'
>>> data.decode('cp1251')
u'{"key":"\\u0440\\u0443\\u0441\\u0441\\u043a\\u0438\\u0439 \\u044f\\u0437\\u044b\\u043a"}'

M
Maxim Vasiliev, 2015-05-06
@qmax

The encoding of the response from the server must be set on the server.

A
Andrey Kobyshev, 2015-05-07
@yokotoka

Campaign the server is crookedly configured. Apache? It is necessary that he does not try to recode everything into this ridiculous cp1251. Well, either edit on the side of the script - do .decode('cp1251') for the data that you receive from the server, if the server is not yours, as you are already doing. This will convert the string to unicode, which you can work with normally.
In short, the server is lying that it gives utf-8. In fact, he gives you cp1251. And Python told you about it.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question