How to make it so that each new result of the parser query on the page is stored in the next cell of the excel column?

S

Simple Ian2020-07-09 09:45:37

Python

Simple Ian, 2020-07-09 09:45:37

There is a python 3.7 parser that makes a request to a page and saves the data to excel. It is necessary that each new request be saved in a new cell of the excel column. I understand you need to write a cycle? I am using openpyxl. At the moment, the parser writes new data to the same cell, i.e. overwrites old data.
Here is the code:

import requests
from bs4 import BeautifulSoup
import openpyxl

# Parser

url = 'https://pythonworld.ru/osnovy/sintaksis-yazyka-python.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')

Parsed = soup.find('h2').text

# excelWriter
# create a new Excell file

xlsfile = openpyxl.Workbook()
xlsfile.sheetnames
sheet = xlsfile['Sheet']

#add data
data = [('Price'), Parsed]
for row, (data) in enumerate(data, start=1):
    sheet['A{}'.format(row)].value = data

#save
xlsfile.save('save.xlsx')

print(Parsed)

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

G

Grigory Boev, 2020-07-09
@Simple_Ian

Before uploading, find out the number of already filled lines and then start with the next one.

for row, (data) in enumerate(data, start=sheet.max_row+1):

S

ScriptKiddo, 2020-07-09
@ScriptKiddo

First, you create a new file each time. Check if the file exists

import os

filename = 'save.xlsx'
if os.path.exists(filename):
    xlsfile = openpyxl.load_workbook(filename)
else:
    xlsfile = openpyxl.Workbook()

Secondly - you need to write with an offset. The offset in this case is the index of the last row on the sheet

max_rows = sheet.max_row
...
sheet['A{}'.format(row + max_rows)].value = data