LUA | Counting Russian characters?

F

Fudo Tsukiko2017-11-04 19:42:21

Unicode

Fudo Tsukiko, 2017-11-04 19:42:21

Here I ran into a problem ...
When counting Unicode (for example, "yalaya"), it gives out not 4 but 8. I know that there are 2 bytes in a Russian character, and I tried to divide by 2 ... But there is another problem - if in the text there will be English letters or spaces, then this method immediately disappears ...

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

Sergey Lerg, 2017-11-04
@Lerg

https://github.com/starwing/luautf8

R

Roman Mirilaczvili, 2017-11-04
@2ord

To count Cyrillic characters , you need to determine the characters whose code points are in the range U+0410 - U+044F, not including ё (U+0451), Ё (U+0401).
Also, the statement is not always true.

I know that there are 2 bytes in a Russian character

since this is a special case depending on the choice of encoding.

D

dollar, 2018-08-07
@dollar

You can use the utf8 extension for Lua.
For example this is https://github.com/starwing/luautf8