S
S
sitev_ru2014-12-24 09:29:53
C++ / C#
sitev_ru, 2014-12-24 09:29:53

Is there a type/class or some approach to work with Unicode String in C++?

What is the best type/class of string to use to work with Unicode strings in C++? Can eat any general, effective approach?
I thought in the direction of wstring, but looking at Wikipedia, I read the following:

In the Windows API, the wchar_t type is referred to as WCHAR and has a fixed size of 16 bits, which prevents the entire Unicode character set (more than 1 million) from being encoded.

After that, I thought about how this is implemented in QString:
QString stores a string of 16-bit QChars, where each QChar corresponds to one Unicode 4.0 character. (Unicode characters with code values ​​greater than 65535 are stored using surrogate pairs, i.e. two consecutive QChars.)

I'm not writing in Qt ... How to be?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
D
Don Kaban, 2014-12-24
@donkaban

utfcpp.sourceforge.net
Can I have some hate? Thank you.
The question "what are the libraries for XXX in the YYY language" was created for search engines. Well, why don't you try to ask this question not to living people on the toaster (or, in especially severe cases, right away on SO), but to soulless robots on Yandex, Google and even (God forbid, of course) all sorts of ungodly bings?
No, well, think for yourself, solid pluses - no one bothers, the answer (thousands of answers!) Is instantaneous, and in the resume you can indicate such a rare skill today - "I know how to search in Google." Solid profit.

I
Ilya Evseev, 2014-12-24
@IlyaEvseev

In the Windows API, the wchar_t type is referred to as WCHAR and has a fixed size of 16 bits, which prevents the entire Unicode character set (more than 1 million) from being encoded.

1) This is a Windows issue, not an application issue. Solve it in the application = to fence crutches.
2) All the main characters in UTF-16 fit. There will be no big trouble if the application cannot work with some ancient Chinese characters.
It's best to use standard classes and don't sweat it.

A
Alexander Ruchkin, 2014-12-24
@VoidEx

Unicode can be encoded even with char using UTF8.

S
sitev_ru, 2014-12-24
@sitev_ru

char*s;
s = utf8_to_char("Hello world!"); //assign to s the string "Hello, world!" in utf-8 format
Now I want to take the 3rd character, that is, the letter "and", but, of course, s[2] will give a different result ... although in windows-1251 encoding it would give the letter "and" ...

A
AxisPod, 2014-12-27
@AxisPod

Get libicu if you need rich UTF features.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question