Answer the question
In order to leave comments, you need to log in
Linux - How to write a string in a different encoding to a string object in c++?
Hello, the essence of the question is that it is not possible to compare two identical strings.
The first string (let it be strig s1 = "name") is transmitted from the FileZilla program via a socket, the second string is written with pens (string s2 = "name"). The lines are exactly the same when output to the console in this way:
printf("s1 = \'%s\', has size %u, and s2 = \'%s\' has size %u\n", s1.c_str(), (unsigned int)s1.size(), s2.c_str(), (unsigned int)s2.size());
I get the following: s1 = 'name', has size 8, and s2 = 'name' has size 4if(s1 == s2) {
doSomething();
}
if(strcmp(s1.c_str(), s2.c_str()) == 0) {
doSomething();
}
Answer the question
In order to leave comments, you need to log in
Try to cast the strings to the same common encoding and then compare. To do this, you can use the libiconv library:
main.cpp:
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace std;
#include <iconv.h>
string iconv_recode(const string from, const string to, string text)
{
iconv_t cnv = iconv_open(to.c_str(), from.c_str());
if (cnv == (iconv_t) - 1) {
iconv_close(cnv);
return "";
}
char *outbuf;
if ((outbuf = (char *) malloc(text.length()*2 + 1)) == NULL) {
iconv_close(cnv);
return "";
}
char *ip = (char *) text.c_str(), *op = outbuf;
size_t icount = text.length(), ocount = text.length()*2;
if (iconv(cnv, &ip, &icount, &op, &ocount) != (size_t) - 1) {
outbuf[text.length()*2 - ocount] = '\0';
text = outbuf;
} else {
text = "";
}
free(outbuf);
iconv_close(cnv);
return text;
}
void compare_strings(const string &aString1, const string &aString2) {
cout << "String 1: " << aString1 << endl
<< "String 2: " << aString2 << endl;
if (aString1 == aString2) {
cout << "Identical strings!" << endl
<< "-----" << endl;
} else {
cout << "Different strings!" << endl
<< "-----" << endl;
}
}
int main()
{
ifstream file_1("word_1.txt"); // The "Proverka" Word in UTF-8
ifstream file_2("word_2.txt"); // The "Proverka" Word in CP1251
string word_1, word_2;
file_1 >> word_1;
file_2 >> word_2;
compare_strings(word_1, word_2);
word_2 = iconv_recode("CP1251", "UTF-8", word_2);
compare_strings(word_1, word_2);
return 0;
}
exl@exl-Lenovo-G560e:~/SandBox/text_enc > enca -L russian word_1.txt
Universal transformation format 8 bits; UTF-8
Doubly-encoded to UTF-8 from ISO-8859-5
exl@exl-Lenovo-G560e:~/SandBox/text_enc > enca -L russian word_2.txt
MS-Windows code page 1251
LF line terminators
exl@exl-Lenovo-G560e:~/SandBox/text_enc > cat word_1.txt
Проверка
exl@exl-Lenovo-G560e:~/SandBox/text_enc > cat word_2.txt
��������
exl@exl-Lenovo-G560e:~/SandBox/text_enc > ./text_coding
String 1: Проверка
String 2: ��������
Different strings!
-----
String 1: Проверка
String 2: Проверка
Identical strings!
-----
I think it's not the encoding, otherwise you wouldn't be able to see the same printout of "name" in both cases. Most likely there is something else in the line, for example, due to incorrect code for receiving data from the socket.
I encountered a similar problem - most likely you are writing incorrectly to a utf8 string. Instead of "name" you have "0n0a0m0e" there, that is, for each character there are 2 bytes instead of one.
For a solution - take any library for working with utf8-16 and make sure that both strings are in the same encoding. As the simplest option, if my assumption with zeros is confirmed, you can simply throw them out with your hands (unless, of course, you have only ascii everywhere).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question