Answer the question
In order to leave comments, you need to log in
PHP vs UTF-8
I am writing a PHP script. Must accept data in one encoding, process it and send it in UTF-8.
In order not to write another version of the script for each encoding of the input data, it was decided to translate any one into UTF-8 and then work with it.
But jambs began to climb out:
echo strlen('тест'); // 8
echo strlen('тестtest'); //12
Question: how to make PHP think in letters, not bytes?
Answer the question
In order to leave comments, you need to log in
There was a similar problem. The mbstring.func_overload parameter in php.ini helped
> Question: how to make PHP think in letters, not bytes?
Answer: no way. To work with multibyte encodings, there is the mbstring extension (http://ru2.php.net/manual/en/book.mbstring.php) that implements the necessary functions.
Read this article on Habré: Determining the text encoding in PHP - an overview of existing solutions, plus one more bike . There is a solution there. This is if mb_convert_encoding(...mb_detect_encoding()) doesn't help you.
If the alphabet is known and predetermined (and not just "any characters of any language"), and if this alphabet is completely covered by any one single-byte code table, then you can use iconv() from UTF-8 to convert to a single-byte encoding, and then again in UTF-8.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question