How to write the parsing result into two xls columns, how to solve the encoding problem?

D

dzhazagalieva2020-06-19 19:36:48

Python

dzhazagalieva, 2020-06-19 19:36:48

Hello, two problems at once.
1. In theory, the final table should have 2 columns (caption, url), but contrary to expectations, the entry goes into one.
caption and url are written as a dream, separated by commas. Every second line is skipped.

2. Apparently, the parsing goes up to the heading "Astérix & Obélix XXL 3: The Crystal Menhir: Overview" and produces:

Traceback (most recent call last):
File "parser_stopgame_v2.py", line 56, in
main()
File "parser_stopgame_v2.py", line 52, in main
get_page_data(html)
File "parser_stopgame_v2.py", line 39, in get_page_data
write_csv(data)
File "parser_stopgame_v2.py", line 20, in write_csv
data['url']) )
File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\encodings\cp1251.py ", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe9' in position 3: character maps to

As I understand it, it's the "é" character. How to fix it?

from bs4 import BeautifulSoup as BS
import requests
import csv

def get_html(url):
    r = requests.get(url)
    return(r.text)

def get_total_pages(html):
    soup = BS(html, 'lxml')
    pages = soup.find('div', class_='pagination').find_all('a', class_="item")[-1].find('span').text
    return int(pages)

def write_csv(data):
    with open('stopgame.csv', 'a') as f:
        writer = csv.writer(f)

        writer.writerow( (data['caption'],
                          data['url']) )

def get_page_data(html):
    soup = BS(html, 'lxml')
    games = soup.find('div', class_='tiles').find_all('div', class_='caption')



    for game in games:
        try:
            caption = game.find('a').text
        except:
            caption = ''
        try:
            url = 'https://stopgame.ru' + game.find('a').get('href')
        except:
            url = ''
        data = {'caption': caption,
                'url': url}
        write_csv(data)
        
def main():
    url = 'https://stopgame.ru/review/new/p'
    total_pages = get_total_pages(get_html(url))

    for i in range(1, total_pages + 1):
        url_gen = url + str(i)
        #print(url_gen)
        html = get_html(url_gen)
        #print(get_page_data(html))
        get_page_data(html)

if __name__ == '__main__':
    main()

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

G

galaxy, 2020-06-19
@dzhazagalieva

1. Recording is going well, Excel just does not recognize the comma separator. Use wizard Data -> From Text/CSV
2. Try

with open('stopgame.csv', 'a', encoding='utf-8') as f:

S

soremix, 2020-06-19
@SoreMix

one.

with open('stopgame.csv', 'a', encoding='utf-8') as f:

2. CSV should not have columns. It's just comma separated data. If you want columns, read how to do it in your program into which the document is loaded.

D

dzhazagalieva, 2020-06-19
@dzhazagalieva

Line skip problem solved Other issues are relevant
with open('stopgame.csv', 'a', newline='') as f: