K
K
Kirill Gorelov2018-11-20 11:35:18
Python
Kirill Gorelov, 2018-11-20 11:35:18

Python how to synchronize threading threads?

Guys, I am writing a script in python to check the existence of links on my site.
For the test, I give it a pool of links and a script, it seems like they will be checked, but it gives errors and incorrect answers.
For example, there is such a link, but he writes that it is not there.
It also gives an error that the variable does not yet exist.

UnboundLocalError: local variable 'code' referenced before assignment

I looked for a solution - I need to synchronize the threads, but it doesn’t work for me, it displays either a bunch of errors, or it doesn’t start at all. Mana didn't help much.
Someone may know how to synchronize threads.
I am attaching the original code.
domain reactone-loc.ru is my local domain.
With a small number of links, it works fine, but if you give it more than 100 links, strong brakes begin (
import time
import requests
import threading
import urllib, socket, time
site_pages = ['http://reactone-loc.ru',
'http://reactone-loc.ru/hththt/mainscript.js',
'http://reactone-loc.ru',
'http://reactone-loc.ru/main/d36/d36af6d8a98e74738b4cb822f4a7e692/KdNC0w_7_zU2.jpg','http://example.com','http://example.com/page1','http://example.com/page5','http://example.com/page6','http://example.com/page7','http://example.com/page9','http://example.com/page10','http://ya.ru','http://yandex.ru','http://yanff.ru','http://ya1.ru','http://ya.com','http://ya.ru','http://ya.ru','http://ya.ru','http://ya.ru'];

failed_pages = [];

def generate_message ():
  n = len(failed_pages)
  list = ""
  if (n > 0):
    list = "404 errors: \r\n"
    for failed_link in failed_pages:
      list = "\r\n".join((list, failed_link))
  else:
    list = "All links are correct"
  return list

def check_pages (pages):
  try:
    # start_time_site = time.time()
    try:
      # print(pages)
      code = urllib.urlopen(pages).getcode()
      # print("--- %s seconds ---" % (time.time() - start_time_site))
    except IOError:
      failed_pages.append(pages)
      # print "Not open url: ", pages
      # print("--- %s seconds ---" % (time.time() - start_time_site))
    print (code)
    print "{0} - {1}".format(pages, code)
    if (code not in [200, 301]):
      failed_pages.append(pages)
  except socket.error, e:
    print "Ping Error: ", e

global_start_time = time.time()

threads = []
for pair in site_pages:
    threads.append(threading.Thread(target=check_pages, args=(pair,)))

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print(generate_message())
print("--- %s seconds ---" % (time.time() - global_start_time))

Answer the question

In order to leave comments, you need to log in

1 answer(s)
B
bbkmzzzz, 2018-11-20
@bbkmzzzz

Use queues queue.Queue()
The queue is synchronized. Principle: add links to the queue, start threads (pass them queues of jobs and results), threads take jobs from the queue, process and put the object with the result in another queue.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question