S
S
Sergey Pugovkin2020-08-04 20:39:34
MySQL
Sergey Pugovkin, 2020-08-04 20:39:34

How good is the idea to use only the sha1 part as a bigint?

Data will be added at about 10 thousand per hour.
What are the chances of getting a collision if using only the first 16 characters (as unsigned bigint) from sha1 of a 100-500 character utf-8 string?
Ideally, if a collision does not threaten for 50 years - excellent.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
Sergey Pugovkin, 2020-08-04
@Driver86

In general, the tests showed that even 14 sha1 characters are enough for the uniqueness of 1 billion of these hashes (~ 12 years at my load).

<?php

declare(strict_types=1);
declare(ticks=1);

error_reporting(E_ALL);
ini_set('display_errors', '1');

function gmp_base_convert($number, int $frombase, int $tobase): string
{
    return gmp_strval(gmp_init($number, $frombase), $tobase);
}

$c = 0;

$t = [];

for ($i = 0; $i < 1000000000; $i++) {

    $s = uniqid((string)$i, true) . microtime() . $i;

    $n = rand(0, 500);

    for ($j = 0; $j < $n; $j++) {
        $s .= chr(rand(0, 255));
    }

    $v = (string)sha1($s);
    $k = (int)gmp_base_convert(substr($v, 0, 14), 16, 10);

    if (isset($t[$k])) {
        $c++;
        echo "{$i}: {$c}\n";
    } else {
        $t[$k] = true;
    }

}

echo "{$i}: {$c}\n";

exit(0);

And already 16 - and even more so enough for the eyes.

D
Developer, 2020-08-04
@samodum

The chances of getting a collision are huge.
Calculate elementary.
Collisions will be daily. There can be no talk of any 50 years

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question