B
B
babbert2019-05-11 13:31:28
Algorithms
babbert, 2019-05-11 13:31:28

A library that will help you understand that "answers" and "OTBETs" are the same thing?

Hello.
I need to make it clear to the program that, for example, "drunk" = "@[email protected]".
Need it to fight ads. I have already come up with an algorithm, but I need to write a lot of letter aliases.
Perhaps someone has already done this before me and there is a ready-made lib for recognizing this so as not to reinvent the wheel, do not tell me?)

Answer the question

In order to leave comments, you need to log in

7 answer(s)
B
Boris Korobkov, 2019-05-11
@BorisKorobkov

This algorithm is over 100 years old: https://ru.wikipedia.org/wiki/Soundex

D
Developer, 2019-05-11
@samodum

"the same" and "the same" are not the same.
My ancient article on this topic
https://m.habr.com/en/post/86303/

M
Maxim, 2019-05-12
@webmaxer

Aliases will not help to solve this problem. There are millions of ways to write the word drunk:
al-kash (from the point of view of the Russian language, everything is normal, it sounds like a name from some Warcraft)
alkash
alkash
a.l.k.a.sh
a1lkash (here, in general, the letter "l" from two characters consists, so what aliases can be?)

N
nrgian, 2019-05-11
@nrgian

I have already come up with an algorithm, but I need to write a lot of letter aliases.
Perhaps someone has already done this before me and there is a ready-made lib for recognizing this so as not to reinvent the wheel, do not tell me?)

Why is it there? It's just a single associative array.
Type:
conv["t"] = "т"
conv["@"] = "а"

And a convenient harness for it, like:
func MyConv(symbol string) string {
   s:= LowerCase(symbol)

   if v, ok:= conv[s] {
       return v
   } else {
       return s
   }
}

And that is all!

D
Dimonchik, 2019-05-11
@dimonchik2013

in the general case, the library does not solve it
through Google)) and you invent here
, but the three-way method works - usually from the other side it’s the same noob

#
#, 2019-05-12
@mindtester

Aliases will not help to solve this problem. There are millions of ways to write a word...

here we come to the AI ​​version:
- you need to have a huge database (which is real) of stolen passwords in order to drive it through dictionaries and teach AI (further version for NOT the faint of heart .. or corporations .. or special services)
- you can train AI on the visual similarity of characters in national layouts (let's say [email protected] and $=s is universal, and h=4 is "in Russian" .. w=8 .. b=6 .. why not? ;))) .. o =0 .. well, this is acceptable in all languages..
- you can train AI for audio matches. but, it's even more resource-intensive (not training, even research (+ video likeness, see paragraphs above, I think they are used much more often)
ps classic password from MS, to bypass the old "strict" rules, for some kind of quick test - [email protected] ssword
improved version (known all over the world,
.. let's say you have a Panasonic monitor.. a few rules in your head, and a cheat sheet is always under your nose - [email protected]$0ni(
c => (... why not? ;)))
.. or - [email protected]$0ni ( .. good luck to compilers of alias dictionaries ;))
.. for the full picture, let's say [email protected]$0ni( .. and again - good luck! ;)))

R
Ruslan., 2019-05-13
@LaRN

There is such an option, but it is probably too powerful for you:
https://tech.yandex.ru/speller/
Here is an interesting article about the transformation of words:
https://habr.com/ru/post/270845/
Maybe it will suit you "The function of evaluating the similarity of a pair of words"

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question