S
S
Sergey Ozeransky2013-05-10 12:00:25
Python
Sergey Ozeransky, 2013-05-10 12:00:25

Python + MS Word?

Good afternoon.
Please tell me the library for parsing .doc and .docx for python 2.7. It should not be OLE, as it will spin on Linux. It is desirable that I also get pictures at the output, maybe even it will be html (which is good).

Answer the question

In order to leave comments, you need to log in

3 answer(s)
P
Pavel Tyslyatsky, 2013-05-10
@tbicr

I think it's better to look at libre office + python (python-uno). It should definitely be able to export to pdf.

N
NetBUG, 2013-05-10
@NetBUG

.docx is a ZIP container, it has a document.xml that parses without problems. Surely somewhere there must be XSLT for combing it automatically.
For .doc there is unoconv, with the help of libreoffice in batch mode it can convert formats. It seems to me that it is more logical to convert them than to look for a specific library for Python.
As far as I know, unoconv/python-uno can produce more than just PDF. :)

R
rvller, 2013-05-11
@rvller

I settled on the OpenOffice + unoconv (python) variant. There are a lot of supported formats (depending on what it costs with OpenOffice), incl. doc(docx) -> html(xhtml)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question