R
R
Rudolf Nemov2020-07-31 18:42:32
Python
Rudolf Nemov, 2020-07-31 18:42:32

How to copy part of one docx document to another from Python?

Hello, I came across a tricky but very interesting problem.

I need to write a python function that is passed 5 values:

source_docx # path - docx файл, из которого нужно забрать данные
outputdocx # path - docx, в который они будут вставлены
trimStart # str - фраза, с которой нужно обрезать данные (к примеру, "глава 1")
trimEnd # str - фраза, по которую данные будут обрезаны (к примеру, "глава 2")
relpace # str - строчка в outputdocx, которую нужно заменить ( к примеру {{replaceme}})


The essence of it all - you need to preserve the formatting, as well as transfer all tables, links, images and god knows what else, what is between trimStart and trimEnd

I would like to ask experienced people what approach to take in order to shed a minimum of tears :)

For time studying the issue, I already realized that the python-docx library does not allow you to save styles and transfer images, since the structure of the donor document is not known in advance.

This action is very simple for a person to perform - the clipboard in Windows saves both the formatting and all the elements of the document. I'm leaning towards recording activities with someone like win32com.

There is also a library.There is also a C# Syncfusion that already implements such a find and replace, but... python =(

There are also macros, but I still don’t understand at all how to bind them to a function in python.

In general, if you have encountered such a problem or have any thoughts, please help me.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
P
PavelMos, 2020-07-31
@rudieduddie

If it is from a file, then, most likely, not directly. Just unzip the .DOCX and manually search the appropriate files for content describing the text at the beginning and end of the phrase. by words and tags. But it is not a fact that the copied text can be saved as a valid DOCX, because DOCX has a very complex structure. That is, it will be necessary to create a lot of additional things before packing it into DOCX.
I think it's better to do it with built-in Word tools.

S
Sergey Tikhonov, 2020-07-31
@tumbler

Take something like python-docx and iterate over paragraphs and other blocks. Until you find trimStart - delete everything to hell :) you do not touch the necessary, there is a possibility that the layout will not work.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question