D
D
Damian Lewis2018-11-10 13:20:48
Windows
Damian Lewis, 2018-11-10 13:20:48

Finding duplicates in text via regular expressions?

Hello! Interested in the ability to find and display duplicates in the text. Not the banal Ctrl+F and word search, but a more complex automated mechanism.
For example: There are a certain number of programs that differ only in version. There are such programs in a bunch of other programs and I want to display such programs with similar names. Like, the names match 50/70%, etc. I heard that this can be done through regular expressions, but I don’t understand anything about them. How to do it?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
X
xmoonlight, 2018-11-10
@xmoonlight

No way. See here .

Example
Мария
Анна
Виктория
Полина
Елизавета
Екатерина
Ксения
Валерия
Варвара Free 1.4
Варвара Pro 2.0
Александра
Вероника
Надежда
Светлана
Злата
Олеся 3.3
Олеся Free_Lite 4.8
Наталья
Эвелина
For simple clustering, you can use PHP and the similar_text() function .

S
Sergey, 2018-11-10
@LiS-31

I don't think you have a complete approach to the problem.
Given:
The set of program names is an array of strings.
Task:
Find duplicate expressions in array values.
Solution (prototype):
Set the names of the programs as an array of strings A.
Perform the operation of splitting strings by character (for example, a space or _) to segment the names, and transfer the resulting values ​​to another array B.
Perform a cyclic check to find the value from the array B in the value array A (for example, the same RegExp /B[i].*?/usix) Sort
\ check the degree of coincidence, etc. to taste and display the result
Implementation tools to taste: Python, PowerShell, Bash, Perl.

D
Damian Lewis, 2018-11-11
@DamianLewis

Found a solution to the problem. It is of course collective farm, but the result is exactly what I need.
Solution:
Need a trace. software: NonCompressibleFiles (free), Advanced Renamer (free), dupeGuru (GPLv3 free).
1. Through the NonCompressibleFiles program, I created as many files as there are lines. Let's say 100 lines, then created 100 files. The size of each file made 2kb. Everything is done in one click.
2. Through the Advanced Renamer program, I renamed all files to line names. Everything is done in one click.
3. Through the dupeGuru program, I searched for duplicates by name +% match. The end result is exactly what I want!
Screen 1
Screen 2

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question