K
K
Kirill Petrov2018-08-02 17:00:00
Python
Kirill Petrov, 2018-08-02 17:00:00

Why does the parser write the data of one cycle to different rows in the database?

The parser collects data from the site, and everything would be ok, only it writes tags from each new row of the database (sqlite3), and other columns are simply duplicated, only the id changes, what can this be connected with?<p>

def get_page_date(html):
    soup = BeautifulSoup(html, 'lxml')
    news = soup.find('div', class_='article-list').find_all('h3', class_='article-list__item-title')
    for new in news:
        try:
            title = new.find('a',class_= 'link_nodecor').text.strip()
            print(title)
        except:
            title = ''
        try:
            url = 'https://example.ru' + new.find('a',class_= 'link_nodecor').get('href')
            print(url)
            post = requests.get(url).text
            soup = BeautifulSoup(post,'lxml')
            articles = soup.find('div',class_='article').find_all('p')
            for article in articles:
                try:
                    post_text = article.text
                    cursor.execute("INSERT INTO news VALUES (?, ?, ?)", (title, post_text, url))
                    cursor.commit()
                    print(post_text)
                except:
                    post_text = ''
        except:
            url = ''

How can this be fixed?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
Ruslan., 2018-08-02
@LaRN

If you look at the alignment of the code, then writing to the database is not performed in a cycle, I'm talking about these two commands:
cursor.execute("INSERT INTO news VALUES (?, ?, ?)", (title, post_text, url))
cursor.commit ()
In this scenario, only the last values ​​of the title, post_text, url variables that were at the time of exiting the loop get into the database.
You need to move the command:
cursor.execute("INSERT INTO news VALUES (?, ?, ?)", (title, post_text, url))
4 positions to the right, and
leave cursor.commit()
as it is.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question