Answer the question
In order to leave comments, you need to log in
How good is the idea to use only the sha1 part as a bigint?
Data will be added at about 10 thousand per hour.
What are the chances of getting a collision if using only the first 16 characters (as unsigned bigint) from sha1 of a 100-500 character utf-8 string?
Ideally, if a collision does not threaten for 50 years - excellent.
Answer the question
In order to leave comments, you need to log in
In general, the tests showed that even 14 sha1 characters are enough for the uniqueness of 1 billion of these hashes (~ 12 years at my load).
<?php
declare(strict_types=1);
declare(ticks=1);
error_reporting(E_ALL);
ini_set('display_errors', '1');
function gmp_base_convert($number, int $frombase, int $tobase): string
{
return gmp_strval(gmp_init($number, $frombase), $tobase);
}
$c = 0;
$t = [];
for ($i = 0; $i < 1000000000; $i++) {
$s = uniqid((string)$i, true) . microtime() . $i;
$n = rand(0, 500);
for ($j = 0; $j < $n; $j++) {
$s .= chr(rand(0, 255));
}
$v = (string)sha1($s);
$k = (int)gmp_base_convert(substr($v, 0, 14), 16, 10);
if (isset($t[$k])) {
$c++;
echo "{$i}: {$c}\n";
} else {
$t[$k] = true;
}
}
echo "{$i}: {$c}\n";
exit(0);
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question