Answer the question
In order to leave comments, you need to log in
How to normalize a unicode string (alphabets and special characters)?
For example, I need to determine if a string in utf-8 contains the word "shovel". It seems to be nothing complicated, if not for:
1. The same letters from different alphabets. In this case, the letters "o" and "a" can be Latin. Cyrillic and Latin - that's not all, is it?
2. Special unicode characters. Zero-width space \u200b, text direction characters (bidi), etc.
How do I deal with this? The normalize method from intl won't help. Something can be replaced manually (cut out some special characters), alphabets can be replaced, but everything cannot be foreseen. Is there any ready-made solution or a complex of several solutions?
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question