I
I
Ivan Koryakin2020-11-11 08:20:49
Python
Ivan Koryakin, 2020-11-11 08:20:49

Parsing in python, what's wrong?

import requests
user_id = 12345
url = 'https://yandex.ru/'
r = requests.get(url)
with open('test.html', 'w') as output_file:
    output_file.write(r.text.encode('cp1251'))


<Error>
Traceback (most recent call last):
File "C:\Users\Ivan\Desktop\python\parser.py", line 6, in
output_file.write(r.text.encode('cp1251'))
File " C:\Python39\lib\encodings\cp1251.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 88884: character maps to
Error>

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
Sergey Gornostaev, 2020-11-11
@valera228822

The root of the problem is that you are trying to convert to cp1251 text that contains characters that are not represented in this encoding. In addition, even if you succeeded in transcoding, the encode method would return bytes, and you opened the file in text mode, another error would occur. You should probably write like this:

with open('test.html', 'w', encode='utf-8') as output_file:
    output_file.write(r.text)

PS Thank you Sergey Pankov for pointing out my inattention.

L
Lone Ice, 2020-11-11
@daemonhk

1. Learn to google
2. Use utf-8

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question