Answer the question
In order to leave comments, you need to log in
How to parse doc file in Python?
A file is a type of several consecutive records of the form:
paragraph1: Name
of paragraph2: <picture>
paragraph3: Description
You need to put all this rubbish in the database, but the problem is that fonts and hyphens are in vain (that is, here Description and Picture in a row, and there is already an empty line between them, and over there the last line of the description borders on a new name, etc.), and I feel sad with regexes (I haven’t really drunk them yet).
What modules to use (references to mana welcome) and what regexes to use?
Answer the question
In order to leave comments, you need to log in
Regex smoking is mandatory + any complex parsing has a non-zero error
variant
Parse doc file with regular expressions? Doubtful (in such cases it is customary to link to stackoverflow.com/a/1732454/2402125).
There are special libraries for parsing doc files (docx, actually), for example https://github.com/mikemaccana/python-docx/
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question