How to solve problems with preg_replace encoding and UTF-8?

M

maximw2014-03-20 15:08:35

PHP

maximw, 2014-03-20 15:08:35

Ran this with PHP 5.3.13 on Windows and PHP 5.3.10 on Ubuntu.
In Chrome 33.0.1750.154 and FF 26.0 browsers that correctly detected the encoding as UTF-8

<?php
header('Content-Type: text/html; charset=UTF-8');
$message = "вае№\n_п8bс!\n  ии";
$message = preg_replace('/[^a-z0-9а-я\!]*/i', '', $message);
echo $message;

Gives some nonsense:

vaep8b�!ii

Of course, the file is UTF-8 encoded without BOM. The browser
Of course, used options with the u modifier, but it didn’t get any better.
Used \w instead of a-z0-9 - did not help.
I tried to set separately uppercase letters and lowercase letters separately to get rid of the i modifier - to no avail.
How can it be treated or replaced?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

smoked, 2014-03-20
@maximw

This is how /[^ca-z0-9a-z!]*/
upd works. /[^\p{L}0-9\!]/iu

S

Stepan, 2014-03-20
@L3n1n

You have a document in UTF-8 and output it in UTF-8, and what is the encoding of the $message string?
Convert it to utf-8 using iconv before preg_replace.
The fact that browsers displayed your text does not mean that it is in UTF-8, they automatically picked up the encoding and ignored your Content-Type: text/html; charset=UTF-8

T

Timur Sergeevich, 2014-03-20
@MyAlesya

$message = "vae#\n_n8bc!\n ii"; what it is??