D
D
dmOx2011-07-26 12:08:22
PHP
dmOx, 2011-07-26 12:08:22

PHP vs UTF-8

I am writing a PHP script. Must accept data in one encoding, process it and send it in UTF-8.
In order not to write another version of the script for each encoding of the input data, it was decided to translate any one into UTF-8 and then work with it.

But jambs began to climb out:

    echo strlen('тест'); // 8
    echo strlen('тестtest'); //12

Question: how to make PHP think in letters, not bytes?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
U
Ura78, 2011-07-26
@dmOx

There was a similar problem. The mbstring.func_overload parameter in php.ini helped

S
Sergey Beresnev, 2011-07-26
@sectus

mb_strlen in particular.

L
LastDragon, 2011-07-26
@LastDragon

> Question: how to make PHP think in letters, not bytes?
Answer: no way. To work with multibyte encodings, there is the mbstring extension (http://ru2.php.net/manual/en/book.mbstring.php) that implements the necessary functions.

Z
zizop, 2011-07-26
@zizop

Read this article on Habré: Determining the text encoding in PHP - an overview of existing solutions, plus one more bike . There is a solution there. This is if mb_convert_encoding(...mb_detect_encoding()) doesn't help you.

K
Kindman, 2011-07-27
@Kindman

If the alphabet is known and predetermined (and not just "any characters of any language"), and if this alphabet is completely covered by any one single-byte code table, then you can use iconv() from UTF-8 to convert to a single-byte encoding, and then again in UTF-8.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question