Answer the question
In order to leave comments, you need to log in
How to use threads non-sequentially?
Wrote a small parser using the threadings module. The problem is that in my case, multithreading does not reduce the script's running time.
As you can see, after starting the next thread, the script waits for its completion and only then proceeds to launch the next thread. Please tell me how to fix this.
import requests
from bs4 import BeautifulSoup
import threading
personal_pages_paths = []
domain = 'https://vk.com'
search_host = 'https://vk.com/people/'
lastnames = [
'Иванов',
'Петров',
'Сидоров',
'Козлов',
'Смирнов',
'Михайлов',
'Соколов',
'Кузнецов',
'Попов',
'Лебедев',
'Волков',
'Морозов',
'Новиков',
]
def get_personal_page_paths(html_text):
paths = []
soup = BeautifulSoup(html_text, 'lxml')
link_obj = soup.find('div', {'class': 'results'}).find_all('a', {'class': 'search_item'})
for path in link_obj:
paths.append(path['href'])
return paths
def recieve_page_html(lastname_page):
with requests.Session() as session:
html = session.get(lastname_page)
lastname_paths = get_personal_page_paths(html.text)
personal_pages_paths.extend(lastname_paths)
def main():
for lastname in lastnames:
lastname_page = search_host + lastname
lastname_paths = []
paths = threading.Thread(target=recieve_page_html, args=(lastname_page,))
paths.start()
paths.join()
print('PATHS:', personal_pages_paths)
print('\n LENGTH: ', len(personal_pages_paths))
if __name__ == "__main__":
main()
Answer the question
In order to leave comments, you need to log in
It is necessary to start all the threads from the beginning, and only then in a separate join cycle, for this, the threads can be collected in a list:
workers = []
for lastname in lastnames:
lastname_page = search_host + lastname
lastname_paths = []
paths = threading.Thread(target=recieve_page_html, args=(lastname_page,))
paths.start()
workers.append(paths)
for w in workers:
w.join()
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question