E
E
esemi2014-05-26 18:38:45
Python
esemi, 2014-05-26 18:38:45

Tornado crawler (AsyncHTTPClient). Can it be easier?

Good evening everyone.
I am rewriting a fast crawler from [curl multi & c-ares] to tornado.httpclient.AsyncHTTPClient .
Rummaged through the documentation, created a simple script

@tornado.gen.coroutine
def test():
    def handle_response(response):
        print 'handle %s' % response.code
    num_of_try, num_of_conn = 10000, 500
    tornado.httpclient.AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient", max_clients=num_of_conn)
    http_client = tornado.httpclient.AsyncHTTPClient()
    responses = yield [http_client.fetch("http://ya.ru/", callback=handle_response) for i in xrange(num_of_try)]
if __name__ == '__main__':
    tornado.ioloop.IOLoop.current().run_sync(test)

At first glance, everything works, but if there is any HTTPError and the processing of the future will throw me an exception in the main thread, while I would like to see it in the handler. If you dig into the code of the tornado, you will see that this is the intended behavior (if yield handles a future that threw an exception, the tornado will rethrow it).
I tried to get around this behavior for a long time and eventually the code took the following form:
@tornado.gen.coroutine
def test():
    def handle_response(response):
        print 'handle %s' % response.code
    num_of_try, num_of_conn = 10000, 500
    tornado.httpclient.AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient", max_clients=num_of_conn)
    http_client = tornado.httpclient.AsyncHTTPClient()
    keys = set(range(num_of_try))
    for i in keys:
        http_client.fetch("http://ya.ru/", callback=(yield tornado.gen.Callback(i)))
    while keys:
        key, res = yield yieldpoints.WaitAny(keys)
        handle_request(res)
        keys.remove(key)
if __name__ == '__main__':
    tornado.ioloop.IOLoop.current().run_sync(test)

Instead of waiting for the future directly, we wait for a unique callback to be called and call the event handler ourselves.
What is the question: is it possible to achieve similar behavior without such terrible crutches with unique sets of keys and callbacks on them?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question