How to properly distribute the load in the program?

D

Dmitry Matveev2016-11-10 22:00:28

Python

Dmitry Matveev, 2016-11-10 22:00:28

Hello, I have no idea how to proceed.
I have an application that parses data from two websites. All this is done synchronously, step by step. The total is 20-25 seconds. One site uses the grequests library (based on gevent). I had an idea how to speed up this process. I have read a lot of information about this. As a result, I found 3 options for python: threading, multiprocessing, asynchronous requests.
How can I better implement the architecture for my parser. I see it this way.
There is a main thread that connects the parser for the first site and the parser for the second site.
Each parser has its own process. That is, now in total we have 3 processes (main, parser1, parser2)
Use asynchronous requests in parser1 and parser2 processes.
Am I thinking right? Or should I be beaten with a shovel?)
And another little question. The difference from asynchronous requests and streams is that asynchronous is a socket that does not close after each request. A thread is just parallelization to use all the resources of the system. Correctly?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2016-11-11
@DmMatveev

about asynchronous is wrong, right now too lazy to look for pictures
at all look at multicurl if it’s cheap and cheerful
if it’s serious - there is Scrapy and Grablib for beginners and graduates (one topic on UpWork with a $10,000 budget for Scrapy is worth it) Scrapers
are even more serious like this , you can check out the downloader there, well, or immediately remake it to fit your needs