D
D
Danila Buyanov2012-09-26 22:19:14
linux
Danila Buyanov, 2012-09-26 22:19:14

Problem with encodings in php and different Linux

Hello!
Today I was brought to white heat by the following problem with encodings. There are thousands of topics on the Internet about such things, and I have already come across a million times, but right there - it sucks!

I had to find all the words in the content of the page and replace the necessary ones (keywords) with links. It would seem like a no-brainer. And it really was so while this plugin was spinning on Debian + php 5.2.

The search for words was carried out using a primitive regular expression:

'/\b'.$keyword.'\b/'

I put this plugin on CentOS, it does not search for Russian words. I understand what needs to be done

setlocale(LC_ALL, 'ru_RU.UTF-8');

. Everything starts to work. I put on another server with CentOS does not work there! And it stops working on Debian!!!

I'm doing so

setlocale(LC_ALL, 'ru_RU.CP1251', 'rus_RUS.CP1251', 'Russian_Russia.1251', 'ru_RU.UTF-8');

funny isn't it! Result works on CentOS stops working on Debian.
I'm already on the edge! I was really worried)

then I thought, what if we replace the regular season with two, so to speak, from a cannon on sparrows!
removed locales and made $regex = array('/\b'.$keyword.'\b/', '/\s'.$keyword.'\s/');

And yes! Comrades earned! But I do n't like it!

Question: how to be so that it is universal and not clumsy ?!

Thank you all for your replies =)

Answer the question

In order to leave comments, you need to log in

4 answer(s)
D
Danila Buyanov, 2012-09-29
@DanyBoo

In general, I got such a topic:
This is the code that I tortured in every way

preg_match_all('#\bпхп|php\b#', 'пхп php', $m);

just without lacals and modifiers (by default, the system works with UTF-8, the script is also saved in UTF-8)
var_dump($m) -> array(1) { [0]=> array(1) { [0]=> string(3) "php" } }

now put the modifier u
var_dump($m) -> array(1) { [0]=> array(1) { [0]=> string(3) "php" } }

Adding
setlocale(LC_ALL, array('ru_RU.cp1251'));

and remove the u modifier
as a result we get what we wanted
var_dump($m) -> array(1) { [0]=> array(2) { [0]=> string(6) "пхп" [1]=> string(3) "php" } }

only confuses string(6) “php” because it should be so in UTF
then I decided to torment the registers, because this is also important, PHP does not work from PHP in any way and PHP does not see more precisely, not that it does not differ, it just stops working regular (((

D
Danila Buyanov, 2012-09-26
@DanyBoo

sad when there are cons and no comments! Somehow this is pidorski not human!

L
lionsimba, 2012-09-28
@lionsimba

setlocale(LC_ALL, "");

?

V
Vyacheslav Plisko, 2012-09-29
@AmdY

By the way, don’t you want to tell the regular expression that it works with Unicode by specifying the modifier “u”
To find out the locale supported by the system, you need to execute “locale -a”, and not guess

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question