Why does preg_match determine the position incorrectly if there are diacritics in the string?

E

Eugene Ordinary2021-11-21 17:05:56

PHP

Eugene Ordinary, 2021-11-21 17:05:56

$str = 'ab'. mb_convert_encoding( '&#x301;', 'UTF-8', 'HTML-ENTITIES' ). 'cdef'; 
// $str = ab́cdef
preg_match( '#de#ui', $str, $matches, PREG_OFFSET_CAPTURE );
// $matches = Array ( [0] => Array ( [0] => de [1] => 5 ) ) 
$subs = mb_substr( $str, $matches[0][1], null, 'UTF-8' );
// $subs = ef

The position of occurrence of de given by preg_match should be 4, not 5. As a result, the mb_substr function copies the substring from the wrong position. Why is that? How to coordinate the work of preg_match and mb_substr?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

rPman, 2021-11-21
@rPman

try mb_ereg_match instead of preg_match as it works with bytes in the string and not multibyte like all mb_...