S
S
SkiBY2017-09-06 22:30:30
Django
SkiBY, 2017-09-06 22:30:30

How to optimize the formation of a large xlsx file on the server using python?

There is a Django + MySQL system from which you need to regularly upload a list of orders.
They are loaded with dependencies. select_related goes over 15 fields.
In the sample, approximately 300,000 rows come out per year.
The situation is as follows: the database unloads everything quickly enough, with one request, it doesn’t take up much space, everything is ok.
But when the formation of a csv-file (csv library) or xlsx ( xlsxwriter ) starts, the system specifically hangs up. On the cloud already from 15000, on the locale 150000 - otherwise the system kills the process. Using constant_memory=True helps - the system works long and tediously and still gives the result. But it's local. The server does not tolerate such a long request.
xlsx file for 150000 turns out to be about 32 MB in size, but the enumeration "eats" all the memory in the system completely.
Everything is done in the standard way - for o in orders and so on, because you need to collect data, and then the generated file is already given.
There are no additional requests to the database during the cycle - everything is from memory.

Cycle code

for o in orders:

        sheet.write(i, 0, o.id)
        sheet.write(i, 1, u'%s' % o.get_o_type_display())
        sheet.write(i, 2, u'%s' % o.o_number)
        if o.o_type == '1':
            name = u'%s, Цвет: %s' % (o.canvas.f_name(), o.color.code)
            sheet.write(i, 3, name)
            name_2 = u'%s, Цвет: %s' % (o.r_canvas.f_name(), o.r_color.code)
            sheet.write(i, 4, name_2)
        else:
            sheet.write(i, 3, u'%s, %s' % (o.part.name, o.part.get_units_display()))
            sheet.write(i, 4, u'%s, %s' % (o.part.name, o.part.get_units_display()))
            

        sheet.write(i, 5, o.rest.id)
        sheet.write(i, 6, u'%s' % o.dealer)
        sheet.write(i, 7, u'%s' % o.dealer.city.name)
        sheet.write(i, 8, o.dealer.id)
        sheet.write(i, 9, u'%s' % o.warehouse)
        sheet.write(i, 10, o.amount, format1)
        sheet.write(i, 11, o.perimetr, format1) if o.perimetr else sheet.write(i, 11, u'-')
        sheet.write(i, 12, o.fix_price, format1)
        sheet.write(i, 13, o.full_price, format1)
        sheet.write(i, 14, o.cost, format1)
        sheet.write(i, 15, u'%s' % o.calc_increase)
        sheet.write(i, 16, o.mat_cost, format1)
        sheet.write(i, 17, u'%s' % o.owner)
        sheet.write(i, 18, o.entry_date.strftime("%Y-%m-%d"))
        sheet.write(i, 19, o.accept_date.strftime("%Y-%m-%d"))
        sheet.write(i, 20, o.in_prod_date.strftime("%Y-%m-%d")) if o.in_prod_date else sheet.write(i, 20, u'-')
        sheet.write(i, 21, o.shipping_date.strftime("%Y-%m-%d")) if o.shipping_date else sheet.write(i, 21, u'-')
        sheet.write(i, 22, o.defect)
        sheet.write(i, 23, o.defect_return)
        sheet.write(i, 24, u'%s' % o.manager_memo)
        sheet.write(i, 25, u'%s' % o.dealer_memo)
        sheet.write(i, 26, u'%s' % o.warehouse_memo)
        sheet.write(i, 27, u'%s' % o.final_memo)
        sheet.write(i, 28, u'%s' % o.final_d_memo)        

        if o.o_type == '1':
            sheet.write(i, 29, u'%s' % o.canvas.id)
            sheet.write(i, 30, u'%s' % o.canvas.c_width)
            sheet.write(i, 31, u'%s' % o.color.code)
        else:
            sheet.write(i, 29, u'%s' % o.part.id)
        sheet.write(i, 32, u'%s' % o.dealer.region.name)
        sheet.write(i, 33, u'%s' % o.get_o_status_display())
        
        
        i += 1


How to optimize this process? Somehow this is still a little - 300,000, but it's already a disaster.
UPD
The memory problem turned out to be the easiest to solve. Helped chunked_queryset with djangosnippets - it makes sense in splitting qs and using yield. But the processing speed, unfortunately, remained the same - too long (

Answer the question

In order to leave comments, you need to log in

2 answer(s)
T
Tash1moto, 2017-11-29
@Tash1moto

I once solved a similar thing through flows.
split the orders list into about 20 parts,
for each part you make a separate stream that writes to your file.
upon completion of all streams, merge the files into one final

F
foxyrus, 2017-09-06
@foxyrus

csv is a plain text file, try writing directly. as an option, split into several CVS and then file-combine into one

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question