T
T
TheAM2020-09-24 17:53:07
Python
TheAM, 2020-09-24 17:53:07

How to write results in Russian to xls or csv file in Python?

Created a script:

from requests_html import HTMLSession
from time import sleep
import random

session = HTMLSession()

# Создаем файл для записи данных
xls_name = f'example-{random.randint(1, 100)}.xls'
with open(xls_name, 'w', encoding='cp1251') as itog:
    itog.write('URL\tH1\tTitle\tDescription\n')

# Открываем файл с URl-страниц и получаем дпанный по каждой странице
with open('list-url.txt', 'r') as url_file:
    for line in url_file:
        url_site = line.strip('\n')

        # Делаем запрос по URL
        response = session.get(url_site)

        h1 = response.html.xpath('//h1/text()')[0]
        title = response.html.xpath('//title/text()')[0]
        description = response.html.xpath('//meta[@name="description"]/@content')[0]

        # Записываем URL, H1, title и description в файл
        with open(xls_name, 'a', encoding='cp1251') as itog:
            itog.write(f'{url_site}\t{h1}\t{title}\t{description}\n')

        print(f'Готово для страницы – {url_site}')
        sleep(2)


In the file list-url.txt I specify the list of URLs of pages from which I want to get H1, title and description. For some reason, one URL is correctly parsed, and then an error occurs:
Ssb3G.png
That is, there is a problem with encoding when writing? Or in something else? Windows 10 operating system. Another clarification - if here:
5f6cb61070368888953140.png
I change the encoding to 'utf-8', then the information is parsed, but in the file I have this:
5f6cb630771ad623653387.png

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question