What technologies are used in real-time WebScraping?

M

malvinfch2019-02-11 12:48:42

Parsing

malvinfch, 2019-02-11 12:48:42

Let's say there is an aggregator site with a search string that returns results, according to the user's request, from 100 other sites. In this case, the request is processed for 3-4 seconds. How it works?
I see an option with daily site scraping and saving the results to the database. And then the user query works directly with the DB.
If you run the script for each request and parse in real time, then this time is clearly not met.
What other options are there?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

E

Evgen, 2019-02-11
@Verz1Lka

It seems to me that if you pull data from your database, 3-4 seconds is too much.
And if you send several requests to the API of different sites in parallel, then this is exactly what will happen.
If we are talking about scrapy, then we can use scrapyrt for this.