Answer the question
In order to leave comments, you need to log in
How to design a "real-time" multi-site parser in Python?
Dear colleagues! Please push me in the right direction.
Let's say the task is:
a Python site contains a search bar into which the user can type arbitrary text.
When the user submits a request, the site should parse the latest headlines from several local news sites, and if there is a match, return the text of the matching headlines and links to relevant news to the user.
Answer the question
In order to leave comments, you need to log in
Real-time parsing is an evil that will slow down your entire system. Isn't it easier to index sites into a database and return results from there? You can run parsing tasks for example once every 10 minutes, which will be enough.
Concurrency is usually done by threads. You can read here https://habrahabr.ru/post/229767/ , https://habrahabr.ru/post/78267/
.
To organize tasks, you can use some kind of target
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question