A
A
Anastasia Sila2014-09-22 12:31:46
Perl
Anastasia Sila, 2014-09-22 12:31:46

Algorithm for finding identical sections in N lines?

I warn you right away, I don’t need to solve this for study or anything accountable, I just woke up a wild interest.
Let's say there are several lines (5, 10, 15 ..) of absolutely arbitrary type and arbitrary length, for simplicity, let's take this:
ABCBDCEACA,
BCDCEEAEDAD,
CAEDCECDAE ... etc.
How can you find a sequence of characters (the longest possible, of course) that is in all strings, if it is not known in advance?
The substring in the example turns out to be DCE, but how can I determine it using the program, I haven’t found any similar algorithms yet ... or I didn’t search well?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
�
âš¡ Kotobotov âš¡, 2014-09-23
@tregnas

heheh, I don’t know why they don’t like to use the Aho-Karasik algorithm, in my opinion the most convenient for such purposes
will be very easy to find the longest branch in the tree.

G
gleb_kudr, 2014-09-22
@gleb_kudr

Such algorithms are extremely widely used in bioinformatics. Here is the direction to start looking en.wikipedia.org/wiki/BLAST

D
DnAp, 2014-09-22
@DnAp

Take a look at the "How it works" section of this article .
It describes the direction in which I would move to write such an algorithm.

G
GavriKos, 2014-09-22
@GavriKos

Do you need some optimal algorithm? Not optimally solved by iterating over all substrings from the first string and looking for them in the rest.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question