F
F
Friend2017-01-31 18:28:05
Django
Friend, 2017-01-31 18:28:05

How to parse information from 511756 API pages as quickly as possible?

I need to create a database, from each page I pull out approximately a maximum of 50 characters in total.
Example:

import threading
import time
def parsep(ar1, ar2):
   for i in range(10000000+ar1, 10000000+ar2):
        try:
            url = "http://api.kakoeto?type=user&id=" + str(i)
            data_json = urlopen(url)
        except:
            time.sleep(1)
            url = "http://api.kakoeto?type=user&id=" + str(i)
            data_json = urlopen(url)
        d = json.loads(data_json.read().decode("utf-8"))
        if d['status'] == 'ok':
            dbCharacter(name=d['name'], id=d['id']).save()

o1 = threading.Thread(target=parsep(0, 3600), name="o1")

There are constantly errors with the connection, because of this I'm waiting for a second. I tried to run through the streams, as a result, it seems to me that everything has become even slower.
Help how can this be done?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Roman Mindlin, 2017-01-31
@kgbplus

Write a script in two lines for scrapy and don’t worry about how many streams it will download into (he knows how to determine it himself)

D
Dimonchik, 2017-02-01
@dimonchik2013

pyCurl and MultiCurl will expand the boundaries of consciousness,
well, or Scrapy

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question