S
S
StrangeAttractor2014-08-11 23:09:02
Text Processing Automation
StrangeAttractor, 2014-08-11 23:09:02

How to algorithmically generate the correct figurine dictionary?

So to speak, from the field of "entertaining puzzles" on language morphological processing.
Because swearing is prohibited here, I replaced the well-known word with a more neutral one, the original idea is here .
And so:
Dictionary - figurine, swans - figebedi, eggs - figaytsy, book - figigat, pathologist - figologofigatom, user - figozovatel ... something like that.
At first glance, it seems simple, but at the second it’s not so much - it is necessary to formalize the definition of the root, replace (and sometimes not all, but only part of it, as in the example with the “user”) the root with another one, ensure “elegant” phonetic docking and rhyme, and etc.
That's interesting: how much it is generally possible to do.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Maxim Vasiliev, 2014-08-12
@qmax

Upd. In general, the algorithm should be as follows:
1. first, select the part to be replaced (base or root)
"identical" -> 'single' + 'eggs' + 'ev' + 'th'
'overpaid' -> 'over' + "weep" + 'enn' + 'th'
"hydroelectric" -> 'hydro' + 'electro' + "station" + 'iya'
There is a stemming algorithm for separating endings : snowball.tartarus.org
prefixes.
In the case of an implied word in one syllable, there is no particular point in bothering with affixes,
The rules for reading the Russian language are about 100 template rules, they are googled with a sign.
Specifically, for the implied word, most likely, only the disclosure ye, yo, yu, ya)
"eggs" -> "yayts" will be needed
.
Maybe highlight - hissing / sonorous. But I think it will be enough to single out combinations like "br" "pr" "pl" "ph", which behave like a whole unit during syllable division or hyphenation.
4. The beginning of the part to be replaced, corresponding to some pattern, should be replaced with a plant root.
For the word "goy" (S=accord, A=vowel, Y=y): /^S+/, /^.*?Y/ (all consonants or all before the first "y")
" y ayts" ->
" st antz " -> "goyants"
" ay pad" -> "goypad"
" rai on" - > "goyon" "goyants" -> "goyants" 6. reassemble the torn off components There are no unambiguous spelling rules for Russian phonetics, but, again, it will probably be enough to fold "ye" "yo "yu" at the junctions. 'one' + "eggs" + 'ev' + 'th' -> "identical" 're' + "weep"+ 'enn' + 'th' -> "reheated" 'hydro' + 'electro' + "station" + 'iya' -> "hydroelectrogoyance"
The main, fourth, point, of course, strongly depends on the word itself and ideas about euphony, but the general meaning and space for experimentation, I think, is clear.
If you want to preserve the poetic size, then you need to save the number of vowels (replace only the first syllable or part of it).
If you want to preserve the poetic rhythm, then you need to save the stress. And here in Russian there is such a fierce ambush that it’s easier to score than to even try to figure it out (about 100 printed pages in a scientific language). Tembolee that the words will be mostly non-existent.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question