Answer the question
In order to leave comments, you need to log in
How to rewrite PHP engine from Windows-1251 to UTF-8?
Good afternoon!
There is a rather big, old engine. The pure code weighs about 3 MB. About half a year ago I successfully migrated it from PHP 5.3 to version 7.1 - there is a lot of documentation on this matter. I studied the recommendations of 4-5 good articles. And there were a couple of moments that were not described there. In principle, everything went quite smoothly.
Now there is another task. The engine is so old that it works with Windows-1251 encoding. Everything needs to be converted to UTF-8. No matter how much I searched, I did not find anything sensible from the advice. Most of the good information comes from behind the hillock, and Win-1251 is not relevant for them. :)
In fact, the only thing that will be described is to replace all functions for working with strings with their "mb_" counterparts. As I understand it, we are talking about this list of analogues: php.net/manual/ru/ref.mbstring.php ?
As far as I understand, in terms of PHP, you will need to at least set the desired encoding in the config. Recode PHP files themselves to UTF-8. A DB - that already separate dances. Another nuance - the site has AJAX, with recoding of the results (because such requests, if I'm not mistaken, are sent only in UTF-8).
Everything, is that enough? What else can be, who has experience in a similar matter? And most importantly - is it possible to simply indicate in the PHP settings that "we work with UTF-8" and that's it. In order not to change all the functions?
Yes, and by the functions themselves neponyatochki. For example, there is preg_replace but no mb_preg_replace. Although at the same time there is ereg_replace and at the same time mb_ereg_replace. Although ereg_replace is already deprecated/removed (depending on PHP version) function. What does it mean?
Tangled up at the end.
2019.03.07 update: I learned a lot of new and interesting things these days. :) I think many people are very superficial in their understanding of encodings. I don't know if this topic is relevant for 2019. But if anything, in principle, after all this I can make a big checklist of what and how to do. The third day I am remaking the engine for UTF-8. It is really not necessary to correct the code in many places. But there are a lot of checkpoints.
And now a small question-clarification to knowledgeable people. Everywhere about preg_* functions they write in the style of "use the u modifier to work with unicode". When is it really needed? According to my observations, the functions do not care with the strings in which encodings to work. PHP treats strings as a collection of bytes, not characters. And the u modifier should only be used when the regular expression itself contains characters outside of US-ASCII. Otherwise, any UTF-8 strings are correctly processed without the u modifier. What do you say?
Answer the question
In order to leave comments, you need to log in
As I understand it, we are talking about this list of analogues: php.net/manual/ru/ref.mbstring.php ?Yes. Plus, you may also have to look for self-written analogues that are not yet in PHP itself. When I rewrote my engine, I only needed 3 analog functions to find.
As far as I understand, in terms of PHP, you will need to at least set the desired encoding in the config.I have this in htaccess:
# Кодировки
php_value mbstring.language "Russian"
php_value mbstring.internal_encoding "UTF-8"
php_value default_charset utf-8
AddDefaultCharset utf-8
DB - then already separate dancesWe convert and when connecting to the database we set mysqli->set_charset('utf8').
Another of the nuances - the site has AJAX, with recoding the resultsyes, remove recoding, leave only the header 'application/json; charset=utf-8' and just make json_encode.
For example, there is preg_replace but no mb_preg_replace.there is a /u modifier
If there are Russian characters in the PHP files, you need to convert from to UTF, if there are templates, data in the database, in js files and in general in all engine files - iconv -f 1251 -t utf8 for them. With base dump/iconv/sed/restore. Further more, web server settings, so that not only the meta with the encoding, which was corrected in the templates, but also the web server (if configured) would be given in utf.
I don’t understand where mb_* has to do with it, it’s a different matter.
block {
background: none;
}
block:hover {
background: url(..);
}
If I understand correctly that it doesn't work
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question