T
T
ti_zh_vrach2021-08-16 21:16:01
Python
ti_zh_vrach, 2021-08-16 21:16:01

How to speed up openpyxl + xlsxwriter? Or is there a solution with other libraries?

Good afternoon!
The script is running slowly while writing data. It takes ~22 minutes for 800 lines. And the first 500 are processed in 7 minutes.

from datetime import datetime

from openpyxl import load_workbook
import xlsxwriter as xl


# Here other functions and variables


def my_script():
    wb_input = load_workbook('objects.xlsx', read_only=True)
    ws_input = wb_input.active

    wb_output = xl.Workbook(f'try_{datetime.now().strftime("%Y-%m-%d")}.xlsx', {'constant_memory': True})
    ws_output = wb_output.add_worksheet('Sheet1')
    make_headers_and_format(wb_output, ws_output, list_of_columns)

    j = 1
    for i in range(1, ws_input.max_row + 1):
        message = f'{datetime.now().strftime("%Y.%m.%d %H:%M:%S")} watching row {i}'
        if is_valid(ws_input.cell(i, 21).value):
            for col in range(len(list_of_columns)):
                ws_output.write(j, col, ws_input.cell(i, col + 1).value)
            j += 1
            message = message + ' catched'
        print(message)

    wb_input.close()
    wb_output.close()

How I localized the problem:
1. First, the entry was on openpyxl with write_only. Replacing with xlsxwriter shortened the process by 2 minutes.
2. I have a parser with xlsxwriter that writes data at a speed of ~1000 lines in 5 minutes. But there it is known for sure that the server is slowly rendering pages.
3. Delay in script execution always comes with lines marked "catched". Up to 13 seconds.
4. On a more powerful computer, the script completes in ~17 minutes.

Log example
2021.08.14 20:30:58 watching row 471
2021.08.14 20:30:58 watching row 472
2021.08.14 20:30:58 watching row 473
2021.08.14 20:30:58 watching row 474
2021.08.14 20:30:58 watching row 475
2021.08.14 20:30:59 watching row 476
2021.08.14 20:30:59 watching row 477
2021.08.14 20:30:59 watching row 478 catched
2021.08.14 20:31:06 watching row 479
2021.08.14 20:31:06 watching row 480 catched
2021.08.14 20:31:10 watching row 481 catched
2021.08.14 20:31:15 watching row 482
2021.08.14 20:31:15 watching row 483 catched
2021.08.14 20:31:19 watching row 484
2021.08.14 20:31:19 watching row 485
2021.08.14 20:31:20 watching row 486 catched
2021.08.14 20:31:24 watching row 487 catched
2021.08.14 20:31:28 watching row 488 catched
2021.08.14 20:31:35 watching row 489


How to speed up the data recording process? Maybe there are better libraries?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
MAKSIM KRASOVSKIY, 2021-08-17
@ti_zh_vrach

I would remove "read_only=True" and xlsxwriter, because openpyxl itself can write data to xlsx.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question