Answer the question
In order to leave comments, you need to log in
Have I done the parser correctly?
Hello, I made a parser for the office website. There can be from 0 to 350 matches. To load data for each match, you need to request the page for that match each time. How did I. I load the main page and read what matches there are. I remember their id. Then I create the appropriate number of threads and run them. A thread loads a function with a single instruction. Which loads the site and puts it in an array.
Then I wait for the completion of all threads. And I parse this data. When there are about 100 matches, the execution time of all threads is 0.7 seconds on average.
But sometimes situations arise when the time goes from 1.5 to 3 seconds with the same number of matches. Why might this be? Is it because of the GIL? And am I doing the right thing? And have I implemented asynchronous requests? Because I still can't figure out what an achinchrony request is. thanks for the help
Answer the question
In order to leave comments, you need to log in
No, it's not because of the GIL.
Creating the number of threads by the number of pages is wrong. Creating a thread is a time-consuming task for the OS.
Not entirely correct, it is more correct to use what is already there.
If scrapy doesn't suit you, you can use ThreadPools instead of manually managing the threads lowlevel api. More or less like this:
from concurrent.futures import ThreadPoolExecutor
from requests import Session
session = Session()
urls = [
'https://toster.ru/q/372757',
'https://toster.ru/',
]
with ThreadPoolExecutor(7) as pool: # 7 - количество тредов
for response in pool.map(session.get, urls):
do_something_with_response(response)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question