B
B
BadCats2020-07-06 20:52:38
Qt
BadCats, 2020-07-06 20:52:38

Remove emojii character from string in Qt?

There is an emoji https://emojipedia.org/tooth/ - you need to remove it from the string.

Unicode taken from here: https://www.fileformat.info/info/unicode/char/1f9b...

Options:

C/C++/Java source code "\uD83E\uDDB7"


if (d->at(k).contains(u8"\\uD83E\\uDDB7")) 

if (d->at(k).contains(QString::fromUtf8(QByteArray("f09fa6b7"))))

Both options don't work.

A little clarification:

In Qt, this emoji is displayed as two characters 55358(0xd83e) and 56759 (0xdbd7) , just like here:

www.mauvecloud.net/charsets/CharCodeFinder.html
(to check - copy from here - https://emojipedia.org /tooth/)

and, checks for these two characters:

if (d->at(k).contains("0xd83e")) и if (d->at(k).contains("0xdbd7"))
- just the opposite works, but

const_cast<QString*>(&d->at(k))->remove("\0xd83e");
const_cast<QString*>(&d->at(k))->remove("\0xdbd7");

- have no effect.

Main question:

Please tell me how to remove this character and its subsidiaries (emoji)

Additional question:
Also, what can you read about issues related to encoding and its formats.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
B
BadCats, 2020-07-12
@BadCats

It turned out to remove the characters. The bottom line is that for some reason Qt does not accept the character if you try to remove or replace it, referring to how

U0001F9B7
, even if the symbol is definitely present there and is output in this format to the console, via qDebug().
To remove emoji characters, you need to use the static method QString::fromWCharArray()- where to pass surrogate pairs
(exactly those that are visible in the debugger as 55358 (0xd83e) and 56759 (0xdbd7) - in my case, for this emoji).
The method will look like this: and, oddly enough, it will return the same string - , but for some reason, Qt already deletes it perfectly:
QString::fromWCharArray(L"\xD83E\xDDB7");
QString tmpStr=QString::fromWCharArray(L"\xD83E\xDDB7");
myStr.remove(tmpStr);

I was helped in searching for an answer:
This topic is in English SO:
https://stackoverflow.com/questions/30247319/how-d...
(Not sure if you can leave links to other forums on the toaster, so here is a quote from the answer:

You already know the answer - specify it as a proper UTF-16 string.
Unicode codepoints above U+FFFF are represented in UTF-16 using a surrogate pair, which is two 16bit codeunits acting together to represent the full Unicode codepoint value. For U+1F50E, the surrogate pair is U+D83D U+DD0E.
In Qt, a UTF-16 codeunit is represented as a QChar, so you need two QChar values, eg:
edit.setText(QString::fromWCharArray(L"\xD83D\xDD0E"));

or:
edit.setText(QString::fromStdWString(L"\xD83D\xDD0E"));

Assuming a platform where sizeof(wchar_t) is 2 and not 4.
In your example, you tried using QString::fromUtf8(), but you gave it an invalid UTF-8 string. For U+1F50E, it should have looked like this instead:
edit.setText(QString::fromUtf8("\xF0\x9F\x94\x8E"));

You can also use QString::fromUcs4()instead:
uint cp = 0x1F50E; edit.setText(QString::fromUcs4(&cp, 1));


)
This calculator for calculating unicode by surrogate pairs and vice versa:
www.russellcottrell.com/greek/utilities/SurrogateP...

I
Ighor July, 2020-07-06
@IGHOR

void translateUnicodeStr(QString& str)
{
    static const QRegExp rx("(\\\\u[0-9a-fA-F]{4})");
    int pos = 0;

    while ((pos = rx.indexIn(str, pos)) != -1)
        str.replace(pos++, 6, QChar(rx.cap(1).right(4).toUShort(nullptr, 16)));
}

V
Vapaamies, 2020-07-07
@vapaamies

return str.remove(QRegularExpression("[\\x{1F600}-\\x{1F7FF}]+"));

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question