Answer the question
In order to leave comments, you need to log in
Answer the question
In order to leave comments, you need to log in
On Windows (including Win7):
>>> import sys
>>> print sys.stdin.encoding
cp866
>>> print sys.stdout.encoding
cp866
print link_text.encode('cp866','replace')
will give Russian text in the cp866 console, replacing Unicode characters that are not in this encoding with a question mark ("?"). >>> t=link_text.encode('cp866','replace').decode('cp866')
>>> for i in xrange(len(t)):
>>> if link_text[i:i+1] != t[i:i+1]: link_text[i:i+1]
>>>
u'\xea'
u'\xab'
u'\xbb'
u'\xea'
u'\xea'
u'\xea'
u'\xea'
u'\xea'
u'\xea'
u'\xea'
u'\xea'
u'\xea'
u'\u2014'
>>> import htmlentitydefs
>>> for i in xrange(len(t)):
>>> if link_text[i:i+1] != t[i:i+1]: htmlentitydefs.codepoint2name[ord(link_text[i:i+1])]
>>>
'ecirc'
'laquo'
'raquo'
'ecirc'
'ecirc'
'ecirc'
'ecirc'
'ecirc'
'ecirc'
'ecirc'
'ecirc'
'ecirc'
'mdash'
did you forget import urllib ?
import urllib
link='http://www.barcelona-nsk.ru/catalog/mebel/jacob-delafone/reve/mebel-pod-rakovinu-117x43,5x37sm-reve'
link_text = unicode(''.join(urllib.urlopen(link).readlines()), 'utf-8')
print link_text
what a wonderful code)))
Why join the output of readlines if you can do it read().replace('\n', '')
?
I would write something like this
import urllib
link='http://www.barcelona-nsk.ru/catalog/mebel/jacob-delafone/reve/mebel-pod-rakovinu-117x43,5x37sm-reve'
body=urllib.urlopen(link).read().replace('\n', '').decode('utf8')
Although perhaps a matter of taste ... Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question