R
R
r4khic2019-08-23 14:14:53
Parsing
r4khic, 2019-08-23 14:14:53

What is the best way to implement checking the latest news for the parser?

I have a parser that parses 10 resources. It works from a DB. That is, it takes the rules for pulling content out of the table. After that, I need to parse the link to the news, title, date, content. I created separate functions for links to news, headlines, dates, content. And after the received links to news, headlines, dates, content. Are brought in already in other table for news. How do I do a news check. That is, so that the parser does not parse repetitive news, and does not enter them into the database.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
F
FeNUMe, 2019-08-23
@r4khic

One of the simple options: at the beginning of parsing, you get the date of the last added news in the database and then simply do not add the parsed news older than this date.
A little more complicated: for each site, remember which URL parsing ended on the last run and stop the loop when you return to this address again.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question