A
A
avonar2011-09-12 10:29:31
Python
avonar, 2011-09-12 10:29:31

threads, python?

There is a code (bydlokod), it is designed to take a proxy from a file and, if it (proxy) is working, then put it in another file. If we are talking about a couple of thousand records, then the script works, if there are a lot of them, then it issues:

Exception in thread Thread-8068:<br/>
Traceback (most recent call last):<br/>
 File &quot;C:\Python27\lib\threading.py&quot;, line 530, in __bootstrap_inner<br/>
 self.run()<br/>
 File &quot;C:\Python27\lib\threading.py&quot;, line 483, in run<br/>
 self.__target(*self.__args, **self.__kwargs)<br/>
 File &quot;E:\python\proxy\proxy_def.py&quot;, line 17, in run<br/>
 g = Grab()<br/>
 File &quot;build\bdist.win32\egg\grab\grab.py&quot;, line 138, in __init__<br/>
 self.trigger_extensions('init')<br/>
 File &quot;build\bdist.win32\egg\grab\grab.py&quot;, line 353, in trigger_extensions<br/>
 getattr(ext, 'extra_%s' % event)(self)<br/>
 File &quot;build\bdist.win32\egg\grab\ext\pycurl.py&quot;, line 47, in extra_init<br/>
 grab.curl = pycurl.Curl()<br/>
error: initializing curl failed

The code itself is here . In general, I use it all under Windows, and I use the grab lib, which uses pycurl (well, you can see it for yourself)

Answer the question

In order to leave comments, you need to log in

2 answer(s)
Y
Yngvie, 2011-09-12
@Yngvie

I would venture to suggest that the problem lies in the following:
curl.haxx.se/libcurl/features.html#thread
Although libcurl itself is geared towards multithreading, it relies on a number of functions (eg gethostby* ) that may not be thread-safe.
With a large number of threads, the chance of such collisions increases. If this is the case, then you can catch such errors with try..except and restart the parser after a while.
It would also be nice to limit the number of concurrently running threads. You have your own crutch for this, but python has a library for working with a task queue:
docs.python.org/library/queue.html
Below is a simple example.

Y
Yngvie, 2011-09-12
@Yngvie

Hmm, I don't know the specifics of curl on Windows, so I can't say the exact number.
Your number of threads jumps between 10 and 90, try different values ​​in this range.
Well, choosing the number of threads to ensure performance is still not the main solution.
Add an error handler that will restart the task with the current proxy.
Judging by the fact that the error occurs in the g = Grab() line, the error occurs randomly, regardless of any parameters, so it should not loop. But just in case, you can limit the number of restarts.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question