Answer the question
In order to leave comments, you need to log in
How to write tornado httpclient asynchronous download script?
how to write a script that downloads content in 10 thousand+ asynchronous requests at the same time?
tool
www.tornadoweb.org/en/branch2.2/httpclient.html
file input from 200+ million hosts
Answer the question
In order to leave comments, you need to log in
You can try to use https://twistedmatrix.com/trac/
There is a richer set for asynchronous work.
Plus, write a scalable application so that you can run it on several instances, and transfer download tasks to them through Celery, for example.
Since you have 200+ million hosts, be sure to build cURL with an asynchronous host resolver (c-ares). Otherwise, there will be no asynchrony.
Here's how to do it on Debian:
apt-get install -y build-essential python-dev python-pip
wget http://c-ares.haxx.se/download/c-ares-1.10.0.tar.gz
tar zxvf c-ares-1.10.0.tar.gz
cd c-ares-1.10.0
./configure
make
make install
wget http://curl.haxx.se/download/curl-7.40.0.tar.gz
tar zxvf curl-7.40.0.tar.gz
cd curl-7.40.0
./configure --enable-ares --with-ssl --with-zlib --enable-ipv6 --with-libidn
make
make install
pip install pycurl --upgrade
rm -rf /usr/lib/libcurl*
ln -s /usr/local/lib/libcurl.so.4 /usr/lib/libcurl.so.4
ln -s /usr/local/lib/libcurl.so.4 /usr/lib/libcurl.so
ldconfig
python -c "import pycurl;print pycurl.version"
# Должна появиться строчка с версией содержащей c-ares
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question