Answer the question
In order to leave comments, you need to log in
Why does the program take so long to process data?
I can not understand why the program is processing data for so long.
I have a docx file in it a table with 2554 entries (lines), I need to process them, and for this I want to put each entry in the list as an element. But somehow it takes too long, and I wonder what it is connected with and whether it is possible to speed it up somehow.
# -*- coding: utf-8 -*-
from docx import Document
doc1 = Document('123.docx')
student = []
print(len(doc1.tables))
print(len(doc1.tables[0].rows))
for i in range(len(doc1.tables[0].rows)):
student.append(doc1.tables[0].rows[i].cells[0].paragraphs[-1].text)
print(student[1])
Answer the question
In order to leave comments, you need to log in
The problem is not in the loops, the problem is in the python-docx module itself, it is it that works slowly, then how you access the cells ultimately leads to the creation of a heap of temporary objects.
this will be much faster:
student = [cell.paragraphs[-1].text for cell in doc1.tables[0].column_cells(0)]
First, what is this horror?
for i in range(len(doc1.tables[0].rows)):
student.append(doc1.tables[0].rows[i].cells[0].paragraphs[-1].text)
table = doc1.tables[0]
for row in table.rows:
student.append(row.cells[0].paragraphs[-1].text)
python-docx is slow, you can manually unpack docx (zip) and work with data directly (xml is there), it's not convenient, but it will work quickly.
It seems to me that it will be faster through the list generator: [student.append(doc1.tables[0].v.cells[0].paragraphs[-1].text) for i,v in enumerate((doc1.tables[0]. rows))]
Is it necessary to parse docx?
Wouldn't it be faster to save to csv and read using a pythonic csv parser?
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question