H
H
HamsterGamer2021-12-24 11:52:37
C++ / C#
HamsterGamer, 2021-12-24 11:52:37

Is there any difference for encrypting UTF-8 strings and ASCII strings via SHA-1?

Hello everyone, the boost library for C ++ has an implementation of SHA-1 encryption, which casts any data array to const char * (that is, in fact, to elements of 1 byte each) and encrypts this sequence of bytes. With ASCII messages, this algorithm works fine (online encoders give the same result), but with some other encoding (for example, 2 bytes per character), it gives a different result than the online encoder.
And I don't just pass an array of wchars, where the elements are 2 bytes each, I do an explicit cast to const char *, and I specify the size as "number of characters" * "character size in bytes". Tell me if I'm doing the right thing or do I need an implementation of the algorithm for UTF-8, and if the latter, could you also tell for C ++ where this version of SHA-1 is present? I've already tried several other libraries and they don't support UTF-8 either (their answer is different from the online coders answer).

PS on the wiki, the algorithm describes how to work with bits, so I'm more inclined to believe that you really just need to maintain the invariant of the algorithm in boost (encoding is not important), which requires just an array of bytes and the number of bytes in the array.

https://wandbox.org/permlink/TUI6UxabUyUZuPWR- here is such an implementation, however, on this wandbox'e, the answer did not converge with the same code on my local machine. Tell me where I'm wrong!?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Adamos, 2021-12-24
@HamsterGamer

In UTF-8, UTF-16LE, and UTF-16BE, a Cyrillic character occupies two bytes.
But in each of these cases it will be two DIFFERENT bytes.
Accordingly, an algorithm that works with bytes will produce a DIFFERENT hash.
You, if you do not want to lay out a rake on your way, it is better to bring your information to bytes yourself, and not rely on any "magic" libraries.

H
HamsterGamer, 2021-12-25
@HamsterGamer

In general, I came to the following result: https://wandbox.org/permlink/r2kPawOj7nKx8dJA
Although there is a terrible auto_ptr used in the boost and it is somehow doubtful, but now the SHA-1 code converges with online encoders!
PS There is absolutely no desire to deal with encodings, so this solution will temporarily suit me ...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question