D
D
Dmitry Zamula2013-12-20 07:27:55
Objective-C
Dmitry Zamula, 2013-12-20 07:27:55

NSUTF8StringEncoding decodes incorrectly, how to fix?

The thing is, I have an application that receives NSData via sockets and immediately after that initializes NSMutableString with this date, with NSUTF8StringEncoding encoding. I tested a lot, it was good, but when I start sending another text (it's a little more, but there is enough space in the date) that is even encoded in the same way, my NSMutableString is simply initialized to nil, I can't understand why it can't initialize the string correctly, with other data.

- (NSMutableString *) recv //Чтение строки из сокета
{
    NSMutableString *recvStr;

        len = [inputStream read:buf maxLength:2048];
        if(len > 0 && len < 2048)
        {
            NSMutableData* data=[[NSMutableData alloc] initWithBytes: (const void*)buf length: len];
            recvStr = [NSMutableString alloc];
            recvStr = [recvStr initWithData:data encoding: NSUTF8StringEncoding];
            
        }
    
    return recvStr;
}

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)
A
An, 2013-12-21
@Flanker_4

Well, here again, you brought only a small part of the code, and now sit and guess for us, I'm generally surprised that @corristo decided to respond :D
If everything else is ok, as you say, then perhaps the problem is in buf? Does it fit exactly the right size?
and you should also look here stackoverflow.com/questions/9701776/nsutf8stringen... you never know, maybe there really is not UTF (there is a code example that allows you to see the encoding, and whether it is supported at all)

You say that it “is definitely UTF-8”, but without a Content-Type header, you don't really know that. (And even if you did have a header saying that, it could still be wrong.)
My guess is that your data is usually ASCII, which always parses correctly as UTF-8, but you sometimes are trying to parse data that's actually encoded in ISO 8859-1 or Windows codepage 1252. Such data will generally be mostly ASCII, but with some bytes outside the 0–127 range ASCII defines. UTF-8 would expect such bytes to form a sequence of code units within a specified sequence of ranges, but in other encodings, any byte, regardless of value, is a complete character on its own. Trying to interpret non-ASCII non-UTF-8 data as UTF-8 will almost always get you either wrong results (wrong characters) or no results at all (cannot decode; decoder returns nil), because the data was never encoded in UTF -8 in the first place.
You should try UTF-8 first, and if it fails, use ISO 8859-1. If you're letting the user retrieve any web page, you should let them change the encoding you use to decode the data, in case they discover that it was actually 8859-9 or codepage-1252 or some other 8-bit encoding.
If you're downloading the data from a specific server, and especially if you have influence on what runs on that server, you should make it serve up an accurate Content-Type header and/or fix whatever bug is causing it to serve up text that isn't in UTF-8.

From the same SO. So most likely you do not have UTF

A
Alexey Storozhev, 2013-12-22
@storoj

Don't separate alloc and init, it could end badly

A
Alexey Storozhev, 2013-12-22
@storoj

There is also a cool GCDAsyncSocket, try it, it will suddenly turn out to be more convenient than bare sockets

M
Mr_Kibernetik, 2014-01-05
@Mr_Kibernetik

If NSMutableString is initialized to nil, this means that the procedure for converting data to NSUTF8StringEncoding was not successful. The main reason for the failure may be that the incoming data is not in UTF-8 format.
You can fix it this way: find out the encoding of the incoming data and convert it using the correct encoding.
A good example of validation code and encoding options is given in the first answer on this page: stackoverflow.com/questions/9701776/nsutf8stringen...
There may be different validation options, but the general idea is the same - to find out what encoding the data is received in.

D
divbyzero, 2013-12-21
@divbyzero

Does this option work correctly (of course, if you substitute your string)?

NSString *str = @"string";
NSData *data = [str dataUsingEncoding:NSUTF8StringEncoding];
NSString *new_str = [NSString stringWithUTF8String:[data bytes]];

A
Alexey Storozhev, 2013-12-24
@storoj

Maybe put all the same for example on pastebin the data (in the form of hex for example) that you send?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question