Why does the program take so long to process data?

A

Alexander Tarasov2015-07-14 08:55:35

Python

Alexander Tarasov, 2015-07-14 08:55:35

I can not understand why the program is processing data for so long.
I have a docx file in it a table with 2554 entries (lines), I need to process them, and for this I want to put each entry in the list as an element. But somehow it takes too long, and I wonder what it is connected with and whether it is possible to speed it up somehow.

# -*- coding: utf-8 -*-
from docx import Document
doc1 = Document('123.docx')

student = []

print(len(doc1.tables))
print(len(doc1.tables[0].rows))
for i in range(len(doc1.tables[0].rows)):
    student.append(doc1.tables[0].rows[i].cells[0].paragraphs[-1].text)
print(student[1])

python 2.7.10 and python-docx 0.8.5

Reply

Answer the question

In order to leave comments, you need to log in

6 answer(s)

A

Artem Klimenko, 2015-07-14
@FiLoY

The problem is not in the loops, the problem is in the python-docx module itself, it is it that works slowly, then how you access the cells ultimately leads to the creation of a heap of temporary objects.
this will be much faster:

student =  [cell.paragraphs[-1].text for cell in doc1.tables[0].column_cells(0)]

R

Roman Kitaev, 2015-07-14
@deliro

First, what is this horror?

for i in range(len(doc1.tables[0].rows)):
    student.append(doc1.tables[0].rows[i].cells[0].paragraphs[-1].text)

Replace with:

table = doc1.tables[0]
for row in table.rows:
    student.append(row.cells[0].paragraphs[-1].text)

Second, how long?

L

lega, 2015-07-14
@lega

python-docx is slow, you can manually unpack docx (zip) and work with data directly (xml is there), it's not convenient, but it will work quickly.

V

Vov Vov, 2015-07-14
@balamut108

It seems to me that it will be faster through the list generator: [student.append(doc1.tables[0].v.cells[0].paragraphs[-1].text) for i,v in enumerate((doc1.tables[0]. rows))]

D

Dvvarreyn, 2015-07-14
@Dvvarreyn

Is it necessary to parse docx?
Wouldn't it be faster to save to csv and read using a pythonic csv parser?

D

Dmitry Smolyakov, 2015-07-14
@spudro

Try extend instead of append.