A
A
Alexander2014-08-14 09:55:36
PHP
Alexander, 2014-08-14 09:55:36

Can duplicate hashes occur?

Good day!
On the project, for each order, a link of the form site.com/a715caeb is generated so
that users without authorization can get into their accounts using these links.
Why use crc32b? To keep the links as short as possible - for use in SMS mailing.
I generate it like this:
$hash = hash('crc32b', md5($client_email.$id_item));
$client_email - client's email
$id_item - item ID
Is this a good solution, don't you think? Doubts are tormenting whether a situation can arise when the same hash is generated for different emails and products? Or incredible?

Answer the question

In order to leave comments, you need to log in

6 answer(s)
A
Artem Lisovsky, 2014-08-14
@torrie

Most likely you will not have such volumes to stumble upon a collision.
But it's better to play it safe - go through the database in search of the same hash. If available, use a different id. And it is better to make a variant with several salts and specifying the salt in the database during generation. Use salt as often as possible when generating any hashes.

F
FanatPHP, 2014-08-14
@FanatPHP

The decision is disgusting. This is already something from a series about Babushkin's archiver.
Well, of course, there will be conflicts.
It's better to leave MD5 (which has a fairly good chance of collisions), but convert it from inefficient base16 to a shorter form. Base64 is quite suitable, since the encoder is in php out of the box. That's just both non-alphanumeric characters there are not suitable for transmission via url - it's better to replace them:
In total, we save 10 characters out of 32. Of course, 22 is worse than 8, but here you have to choose - either a sufficient length, or collisions and no security at all.

I
ivankomolin, 2014-08-14
@ivankomolin

If anyone finds out this:

Генерирую так:
$hash = hash('crc32b', md5($client_email.$id_item));

That will be able to access any account knowing only the user's email.
This is not good.
In such cases, you need to do this:
1. When creating a user, generate a "token" by which you can log in without a password, for example:
2. Each time the user enters the account using this "token", change it.
Well, in order not to get identical "tokens", you need to check for the presence of the same when creating / changing a "token". Those. generate a "token" until it is unique, then write it to the database.

3
386DX, 2014-08-14
@386DX

https://www.linux.org.ru/search.jsp?q=MD5+COLLISIONS...

A
Andrey Ezhgurov, 2014-08-14
@eandr_67

Absolutely NOT normal. Read on the wiki about the "birthday paradox". The collision probability for an absolutely uniform distribution is 1/sqrt(2^32)=1/(2^16)=1/65536. In reality, crc32 does not provide uniform distribution and therefore the probability of a collision will be much greater.
In addition, such a short address will lead to the fact that an attacker will be able to get to other people's orders by simple enumeration.

T
throughtheether, 2014-08-14
@throughtheether

By counting md5 first, then crc32, you increase the chance of a collision in my opinion . A collision can occur in the md5 function (same values ​​for different inputs), which will immediately lead to a collision on the crc32 output (same values ​​for the same inputs). In addition, a collision may occur in the crc32b function (same output values ​​for two different md5 results). I don't know how the md5 function in php works, but it seems to me that its output has some structure (32 hexadecimal digits), which may increase the chance of a crc32b collision.

Why use crc32b? To keep the links as short as possible - for use in SMS mailing.
If I were you, if determinism is needed, I would:
1) use hash functions from the SHA-2 family (SHA-256,SHA-512).
2) would use the last N bits (for example, 48, the complexity of selecting a given hash is 2^48 attempts on average, the complexity of finding two arbitrary users with matching hashes, as @eandr_67 noted in a comment, is much less, 2^24 attempts on average, in due to the quadratic number of pairs)
3) would translate these 48 bits into 8 characters of a 64-character alphabet (Latin letters in both cases + numbers + 2 characters)
4) would use the resulting 8-character string
Items 2,3,4 can be adjusted to your specific requirements.
Firstly, it is not clear why orders are tracked, and you count the hash from the product. Today, for example, a client orders one product at a time, and tomorrow, when the concept changes, two or more. It seems more logical to track orders (approximate characteristics of the order - phone number / user email, list of goods, delivery time and address, etc.).
Further, it is not clear why it was decided to use hashing. I assume that there are no problems with data storage. Why not store along with the order data the corresponding 'secret' link generated by the PRNG?
Further, if you plan to use a short link in SMS, then it makes sense to use a specialized alphabet for compiling it, selecting from the set az,AZ,0-9 the most convenient in terms of UX. For example, the letters Iand l are difficult to distinguish. It is also important here to maximize the power of the set of possible values ​​of the link string (equal to A^l, where A is the number of characters in the alphabet, l is the length of the link string). In the case of 32 characters in the alphabet and 8 characters in a line, we get 2^40 options, the complexity of finding a pair of users with matching links is on average 2^20 attempts (the probabilities of relevant collisions, respectively, are inverse to these values).
Further, it is possible to separate the functionality of the order page, with simple access via a link, to show only the status / general information, and allow important actions (cancel the order, change the address, etc.) after passing additional verification (for example, to a known phone number / e -mail client forwards a short-lived secret that must be entered on the page to continue).
Such thoughts.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question