How to deal with 'utf-8' codec can't decode byte 0xc0 in position 199: invalid start byte error?

T

Taya2019-06-10 18:53:04

Python

Taya, 2019-06-10 18:53:04

there is a site, the data from which are obtained in the form

<h4>Àäðåñà ìàãàçèíîâ</h4><br>\n      <h4>ã. Ðûáèíñê</h4><br>\n      <ol>\n        <li>Óë. Ëüâà Îøàíèíà, ä.5 òåë. (4855)26-57-64</li>\n

I'm trying to decode .content.decode('utf-8') like this error utf-8' codec can't decode byte 0xc0 in position 199: invalid start byte
if I do .content.decode('utf-8', errors='ignore')this I get a
string

<h4> </h4><br>\n      <h4>. </h4><br>\n      <ol>\n        <li>.  , .5 . (4855)26-57-64</li>\n        <li>. , . , .10 . (4855)27-38-77</li>\n

how to be?

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

A

angernicky, 2020-08-24
@angernicky

I also solved this bug. In general, you need to use either CP866 or Windows-1251 encoding.

T

Taya, 2019-06-11
@Taya93

anyway, I found a solution.
it is in Windows-1251 encoding

R

Roman Kitaev, 2019-06-10
@deliro

Decode with correct encoding

A

Alexey Guest007, 2019-06-11
@Guest007

.content.encode('utf-8')?