How to make a selection of words from a dictionary to get a given phrase (anagrams)?

A

Andrey Ivchenkov2015-08-26 03:30:26

PHP

Andrey Ivchenkov, 2015-08-26 03:30:26

I want to make a service similar to this:
www.wordplays.com/anagrammer
The bottom line is this: the user enters a certain phrase. In response, he receives sentences from words, the letters of which are all present in the original phrase. The words are taken from the database.
I came to the conclusion that each word must be assigned a key, which is all the characters of the word ordered alphabetically. For example, the word "button". The keyword for this word will be "bnootu".
The algorithm boils down to the following:

We take the phrase that the user entered and compose a key for it (we remove everything except letters from the phrase).
Using a simple regular expression, we select the keys from the database that are "included" in the original key, i.e. contain only those letters that are in the original key. Not all letters may be present in the desired key, the main thing is that the number of repetitions does not exceed the number of repetitions of letters in the original key. For example, if you take the key "bnootu" as initial, then the key "botu" may be suitable for it.
From the found keys, we compose combinations so that in total these keys represent the original key.
From the compiled combinations, we generate ready-made phrases.

Actually plugging happened with point 3. The problem is that the number of keys in the combination can vary. I was able to solve this problem through recursion and enumeration, but ran into limited resources.
Therefore, I wonder if there are any options for solving point 3 without recursion? And even if with recursion, how to avoid the presence of duplicate combinations (for example, combinations of " ro ot " and " ot ro ")?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

M

mrrl, 2015-08-26
@Groonya

The multidimensional knapsack problem.
Let's say we have the phrase "aaabbc" and a dictionary

aab
abb
abc
bbcc
aac

For a phrase, we count how many times each letter occurs in it. Letter 'a' occurs na=3 times, letter 'b' - nb=2 times, letter 'c' - nc=1 times.
We start a bit array B with dimensions 0..na, 0..nb. 0..nc (in our example it is 24 bits). We put 1 in the element (0,0,0), and 0 in the rest.

1000 0000
0000 0000
0000 0000

Now we sort through the words from the dictionary. For each word, we count the number of each letter in it (for aab this is ma=2,mb=1,mc=0), and build an array B1, in which B1[a,b,c] = B[a,b,c] | B[a-ma,b-mb,c-mc] (for negative indices, we assume values are zero). In our case, it will

1000 0000
0010 0000
0000 0000

We continue for the rest of the words. After adding each word, we check the element B[na,nb,nc]. If it is non-zero, we have found a variant (however, we remember only the last word from it - we will restore the rest on the next passes). Let's remember the variant, set the element B[na,nb,nc] to zero.
We will be able to:

abb=(1,2,0)

1000 0000
0010 0000
0100 0000

abc=(1,1,1)

1000 0000
0010 0100
0100 0001

The first option is - it ends with "abc". Set B[3,2,1] to zero and continue.
The word bbcc=(0,2,2) can be ignored, it has mc > nc.

aac=(2,0,1)

1000 0010
0010 0100
0100 0001

Found the second option - ends with "aac". The dictionary has run out of words.
Now we need to build the phrase "aab" from the dictionary

aab
abb

and the phrase "abb" from the dictionary

aab
abb
abc
bbcc

Theoretically, this can be done at the same time - but you have to keep track of several indexes. And it is no longer possible to reset them, you need to look to see if they are trying to write 1 into them (even if there already was a value of 1).
If you have 2 GB of memory for the array, and no more than 26 different letters, then this is enough for a phrase of 39 letters (the worst case is when 13 letters occur 1 time, and 13 letters 2 times).