P
P
ParnishkaSPB2020-06-07 11:48:12
Python
ParnishkaSPB, 2020-06-07 11:48:12

(Python) How to parse a site if the URL is in Russian?

You need to parse the site (Python), but its URL is in Russian, which subsequently does not allow you to go further to the site itself. How to do it?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Alexander, 2020-06-07
@ParnishkaSPB

Hostname encoded in idna

>>> u'сайт.рф'.encode('idna')
b'xn--80aswg.xn--p1ai'
>>> b'xn--80aswg.xn--p1ai'.decode('idna')
'сайт.рф'

from urllib.parse import urlparse,ParseResult,quote
u = urlparse("http://сайт.рф/тест/тест2")
encoded_url=ParseResult(u.scheme,u.netloc.encode('idna').decode('ascii'),quote(u.path),u.params,u.query,u.fragment).geturl()
>>> 'http://xn--80aswg.xn--p1ai/%D1%82%D0%B5%D1%81%D1%82/%D1%82%D0%B5%D1%81%D1%822'

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question