H
H
helixly2015-06-08 11:45:53
Python
helixly, 2015-06-08 11:45:53

How to read html file in python?

f = open (filename , 'r')
result = f.read()
print(result)

Result
Traceback (most recent call last):
  File "file.py", line 10, in <module>
   result = f.read()
  File "C:\Python34\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 10419: ch
aracter maps to <undefined>

Tell me how to read the file correctly? I've already broken my head.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
V
Vadim Shandrinov, 2015-06-08
@suguby

you can see that there is an attempt to recode from your favorite cp1251 :) and see the file in UTF8...
try to open it as 'rb' and convert line by line

f = open (filemane , 'rb')
for line in f:
     print(line.decode('utf8'))

But in general it would be nice to know what encoding the file is in.

A
Alex_Korj, 2016-08-13
@Alex_Korj

i solved the problem by converting the files to utf-8 -- ANSI

A
Arthur, 2015-06-08
@ArthurG

Try to add at the beginning of the *.py file:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question