R
R
rodion-dev2015-03-29 14:29:49
Python
rodion-dev, 2015-03-29 14:29:49

How to write tornado httpclient asynchronous download script?

how to write a script that downloads content in 10 thousand+ asynchronous requests at the same time?
tool
www.tornadoweb.org/en/branch2.2/httpclient.html
file input from 200+ million hosts

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
sakuradaj, 2015-03-30
@sakuradaj

You can try to use https://twistedmatrix.com/trac/
There is a richer set for asynchronous work.
Plus, write a scalable application so that you can run it on several instances, and transfer download tasks to them through Celery, for example.

L
lega, 2015-03-30
@lega

Working example .

M
m0ody, 2015-04-03
@m0ody

Since you have 200+ million hosts, be sure to build cURL with an asynchronous host resolver (c-ares). Otherwise, there will be no asynchrony.
Here's how to do it on Debian:

apt-get install -y build-essential python-dev python-pip

wget http://c-ares.haxx.se/download/c-ares-1.10.0.tar.gz
tar zxvf c-ares-1.10.0.tar.gz
cd c-ares-1.10.0
./configure
make
make install

wget http://curl.haxx.se/download/curl-7.40.0.tar.gz
tar zxvf curl-7.40.0.tar.gz 
cd curl-7.40.0
./configure --enable-ares --with-ssl --with-zlib --enable-ipv6 --with-libidn
make
make install

pip install pycurl --upgrade

rm -rf /usr/lib/libcurl*
ln -s /usr/local/lib/libcurl.so.4 /usr/lib/libcurl.so.4
ln -s /usr/local/lib/libcurl.so.4 /usr/lib/libcurl.so
ldconfig

python -c "import pycurl;print pycurl.version" 
# Должна появиться строчка с версией содержащей c-ares

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question