Answer the question
In order to leave comments, you need to log in
Why does the parser write the data of one cycle to different rows in the database?
The parser collects data from the site, and everything would be ok, only it writes tags from each new row of the database (sqlite3), and other columns are simply duplicated, only the id changes, what can this be connected with?<p>
def get_page_date(html):
soup = BeautifulSoup(html, 'lxml')
news = soup.find('div', class_='article-list').find_all('h3', class_='article-list__item-title')
for new in news:
try:
title = new.find('a',class_= 'link_nodecor').text.strip()
print(title)
except:
title = ''
try:
url = 'https://example.ru' + new.find('a',class_= 'link_nodecor').get('href')
print(url)
post = requests.get(url).text
soup = BeautifulSoup(post,'lxml')
articles = soup.find('div',class_='article').find_all('p')
for article in articles:
try:
post_text = article.text
cursor.execute("INSERT INTO news VALUES (?, ?, ?)", (title, post_text, url))
cursor.commit()
print(post_text)
except:
post_text = ''
except:
url = ''
Answer the question
In order to leave comments, you need to log in
If you look at the alignment of the code, then writing to the database is not performed in a cycle, I'm talking about these two commands:
cursor.execute("INSERT INTO news VALUES (?, ?, ?)", (title, post_text, url))
cursor.commit ()
In this scenario, only the last values of the title, post_text, url variables that were at the time of exiting the loop get into the database.
You need to move the command:
cursor.execute("INSERT INTO news VALUES (?, ?, ?)", (title, post_text, url))
4 positions to the right, and
leave cursor.commit()
as it is.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question