A
A
Alexey2013-03-19 22:11:03
MySQL
Alexey, 2013-03-19 22:11:03

regexp does not work correctly in mySQL

I tried to filter the data using the operation:
SELECT *
FROM posts
WHERE text REGEXP '[a-zA-Z]'

In theory, I was supposed to return all posts where there are Russian letters, but he also returned some extra ones. While understood, came across the following opportunity:

SELECT '“' REGEXP '[M]';
returns 1

Character codes
ord('M') = 53404 ("M" is a Russian letter)
ord('"') = 14844060

Can anyone tell me why this works?

mySQL Version: 5.5.11
Character Set: utf8

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexey Akulovich, 2013-03-19
@AterCattus

dev.mysql.com/doc/refman/5.5/en/regexp.html

Warning
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values ​​and accented characters may not compare as equal even if a given collation treats them as equal.

V
vsespb, 2013-03-19
@vsespb

dev.mysql.com/doc/refman/5.5/en/regexp.html#operator_regexp

The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values ​​and accented characters may not compare as equal even if a given collation treats them as equal.

Why exactly this regexp matches - I don’t know. In theory, for this you need to know exactly the internal representation of all data in mysql and the implementation.
The byte representation of these characters in various unicode encodings did not prompt any thought.
www.fileformat.info/info/unicode/char/43c/index.htm
www.fileformat.info/info/unicode/char/201c/index.htm

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question