H
H
Hint2015-10-06 13:13:14
PHP
Hint, 2015-10-06 13:13:14

How to normalize a unicode string (alphabets and special characters)?

For example, I need to determine if a string in utf-8 contains the word "shovel". It seems to be nothing complicated, if not for:
1. The same letters from different alphabets. In this case, the letters "o" and "a" can be Latin. Cyrillic and Latin - that's not all, is it?
2. Special unicode characters. Zero-width space \u200b, text direction characters (bidi), etc.
How do I deal with this? The normalize method from intl won't help. Something can be replaced manually (cut out some special characters), alphabets can be replaced, but everything cannot be foreseen. Is there any ready-made solution or a complex of several solutions?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
shagguboy, 2015-10-06
@shagguboy

does not exist due to complete uselessness.

C
Cat Anton, 2015-10-06
@27cm

but do not foresee everything

How to foresee. It is enough to remove everything except the list of allowed characters. The whole solution in one regular expression.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question