Answer the question
In order to leave comments, you need to log in
Optimizing objects.all() for a huge database. How to get everything and not freeze for N minutes?
Good day, friends.
I have a PostgreSQL database with two tables. In the first - about 3 million records (links to the Internet). It just so happened that to get the second table, you need to go through all the rows from the first table, take a link, do some manipulations on the Internet and write down the result (for each record in the first table, respectively, 200+ records in the second).
The essence of the problem:
The most obvious approach:
for i in Item.objects.all():
doSomething(i)
items = list(Item.objects.all())
for i in items:
doSomething(i)
Answer the question
In order to leave comments, you need to log in
When iterating, the entire querieset is loaded into memory, hence the problem. The solution proposed by Alexander Vtyurin , although somewhat clumsy, will work: the idea there is correct. A few years ago, this problem was very acute, so even the widely known in narrow circles Snippet #1949 appeared , made precisely on this principle.
But starting from Django version, if I'm not mistaken, 1.4, a regular tool appeared, designed for similar purposes - the iterator () method of the queriset.
there is a suspicion that it's not about .all(), but about doSomething (3k times to get into the Internet and save something to the database). You can run this code to check:
for i, item in enumerate(Item.objects.all()):
x = i + i
Googled this solution:
while True:
items = Item.objects.filter(pk__gte=i*1000, pk__lt=(i+1)*1000)
try:
for j in items:
doSomething(j)
except Item.DoesNotExist:
break
i += 1
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question