WideCharToMultiByte msvc 2012 bug?

D

Door2012-12-03 19:42:06

C++ / C#

Door, 2012-12-03 19:42:06

Hello. There is a certain string
conversion function:wchar_t*

bool WcharToString(const wchar_t* wstr, std::string& converted, UINT codePage = CP_ACP)
{
  if(wstr)
  {
    int length = ::WideCharToMultiByte(codePage, 0, wstr, -1, NULL, 0, NULL, NULL);
    char* str = new char[length + 1];
    str[length] = '\0';
    
    // ignoring returned value
    ::WideCharToMultiByte(codePage, 0, wstr, -1, str, length, NULL, NULL);
    converted.assign(str);
    delete [] str;
    return true;
  }
  return false;
}

And it turns out that it works fine if you use 2010 studio, but it doesn't work if it's 2012 studio. What do you think about this - is the function written above incorrectly or a bug in the studio's 2012 libraries?
(itself is declared WideCharToMultiBytein WinNls.h 2010 studio and in Stringapiset.h 2012 - i.e. something changed :) ).
If it's a bug, can Microsoft report it? And if not a bug, tell me how to do it right.
Thanks in advance.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

K

Konstantin Vlasov, 2012-12-03
@Door

::WideCharToMultiByte(codePage, 0, wstr, -1, str, length, NULL, NULL);

A small correction: the size of the target buffer should not be length , but length+1 (after all, that's how much memory is allocated), then you don't have to insert the trailing zero manually, the function will do it itself.
As for the problem, I did not understand what UTF-8 has to do with it. It is converted not to UTF-8 (CP_UTF8), but to ANSI (CP_ACP). And in the Studio debugger it shows ANSI. Only in 2012 is this ANSI from a different code page. That is, apparently, there is a bug in the Studio debugger. If you open the string and examine it character by character, you will see that the character codes in both cases are the same. It's just that 2010 uses one code page (Russian, 1251) to display these codes as a string, and 2012 uses another (1252, Western).

V

vScherba, 2012-12-03
@vScherba

And try to explicitly specify the encoding instead of CP_ACP.
The following trick is often used for conversion:

string str = _bstr_t(wstr).operator char*();

.operator char*() uses WideCharToMultiByte in the implementation. If this trick also doesn't work correctly, it's most likely a bug in Microsoft.

I

ixSci, 2012-12-04
@ixSci

If you are using 2012 studio why not write:
std::string utilities::utf16ToUtf8(const std::wstring& utf16) { std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert; return convert.to_bytes(utf16.c_str()); }

K

kostik450, 2013-01-16
@kostik450

I also had krakozyabry instead of Russian letters, it was decided like this:

#include <locale.h>
int main (int argc, char **argv)
{
   setlocale (LC_ALL, ".1251");
}