Answer the question
In order to leave comments, you need to log in
How to properly distribute the load in the program?
Hello, I have no idea how to proceed.
I have an application that parses data from two websites. All this is done synchronously, step by step. The total is 20-25 seconds. One site uses the grequests library (based on gevent). I had an idea how to speed up this process. I have read a lot of information about this. As a result, I found 3 options for python: threading, multiprocessing, asynchronous requests.
How can I better implement the architecture for my parser. I see it this way.
There is a main thread that connects the parser for the first site and the parser for the second site.
Each parser has its own process. That is, now in total we have 3 processes (main, parser1, parser2)
Use asynchronous requests in parser1 and parser2 processes.
Am I thinking right? Or should I be beaten with a shovel?)
And another little question. The difference from asynchronous requests and streams is that asynchronous is a socket that does not close after each request. A thread is just parallelization to use all the resources of the system. Correctly?
Answer the question
In order to leave comments, you need to log in
about asynchronous is wrong, right now too lazy to look for pictures
at all look at multicurl if it’s cheap and cheerful
if it’s serious - there is Scrapy and Grablib for beginners and graduates (one topic on UpWork with a $10,000 budget for Scrapy is worth it) Scrapers
are even more serious like this , you can check out the downloader there, well, or immediately remake it to fit your needs
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question