M
M
Monster832020-11-24 20:21:17
Programming
Monster83, 2020-11-24 20:21:17

Can you help me write a program to optimize work with a large Word document?

Good afternoon fellow programmers. Let me say right away that I am new to this forum, and therefore I ask you to understand (and, if possible, forgive) if I am doing something wrong: asking stupid questions, etc.
I am a student whose profile has nothing to do with programming and the exact sciences, but I have a need to process a large Word text document. Specifically, I need to write a program (possibly a macro) that would allow to detect repetitions in this text document. There is no manual search, since there is a lot of data, more than one and a half thousand positions. To make it clearer, I'll try to show clearly.
There is a document that contains information in the form of paragraphs and has the following schematic view:
A
A
B
C
B
D
F
D
etc.
I need to find all repetitions in the minimum number of actions and exclude them, leaving only one of the repetitions, i.e. bring the data to the form:
A
B
C
D
and so on ...
Perhaps the question is rather banal, but searching the Internet I did not find anything suitable, except for the usual search (Crtl + F). But it will take a lot of time to check all the positions, so this is the most extreme option. Tell me, is it possible to speed up this process with the help of Word? Or do you need third party software? If so, which ones?
Thank you in advance for your help

Answer the question

In order to leave comments, you need to log in

3 answer(s)
M
Mike, 2020-11-29
@Monster83

It is necessary to write a function (macro) in the programming language built into Word - Microsoft Visual Basic for Applications, abbreviated as VBA.
The task looks pretty standard, a quick search on "VBA sort remove duplicates +word -excel" finds similar ready-made solutions:
https://mozgotron.livejournal.com/74002.html

A
Adamos, 2020-11-24
@Adamos

libreoffice --headless --convert-to txt file.docx 
cat file.txt | uniq -u > file1.txt
libreoffice --headless --convert-to docx file1.txt

M
Maxim K, 2020-11-25
@mkvmaks

In a standard office, there is a function to search for duplicates, and you can also delete them there.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question