how to generate unique integer id in cluster?

L

Lestat2011-09-01 00:25:28

MySQL

Lestat, 2011-09-01 00:25:28

Hello, I would like to understand the issue of generating a unique numeric ID in a cluster.

Actually, looking at www.mongodb.org/display/DOCS/Object+IDs
, the same option is still seen, only the size of the allocated bytes will be slightly smaller.

We do 4 parts:

1) time in epoch format (unix, posix), in total the number of seconds since 1970.
time = 8 bytes integer (2 ** (4 * 8) = 4294967296 variants)
import time
int(time.time())
part1 = 1314823196 /…

2) machine_id = 2 bytes integer (2 ** (4 * 2) = 256 variants)
there are 2 options,
either generating machine_id according to hostname, or just some internal number, which, say, is written depending on hostname,
i.e. you can just machine_id = 1 / 2 / 3 ... / 10
part2 = 1 / 2 / 3 / ...

3) pid = 2 bytes integer (2 ** (4 * 4) = 65536 variants)
here I can’t check the exact options yet, but in macos, for example, pid of this kind: 01318, i.e. very little.
part3 = 01318 /…

4) increment = 4 bytes integer (2 ** (4 * 4) = 65536 variants)

is more complicated here,
for example, 8 cores will have the same time, machine_id, pid.
only increment remains.
here I see an option to use some kind of memcached incr, for example,
to immediately increase and get this value atomically.
then 8 threads will get the key 1,2,3,4,5,6,7,8 respectively and there will be no collisions.
import memcache
cache = memcache.Client(['127.0.0.1:11211'])
cache_key = 'cache_pid_'.format(pid_id)
cache.set(cache_key, 0)
increment_id = cache.incr(cache_key) - increment and return new incremented the field
part4 = 1 / 2 / 3 /…

results in something like this:

1314823196 + 1 + 01318 + 1
1314823196 + 1 + 013196
+ 1 + 01318 + 3

1314823196 + 2 + 71673 + 1
1314823196 + 2 + 71673 + 2
1314823196 + 2 + 71673 + 3

...
i.e. the number is somewhere like this:
13148231962716733

if you take BIGINT for storage, then it turns out
-9223372036854775808 to 9223372036854775807

13 148 231 962 716 733
9 223 273 036 854 775 807

i.e. a lot of stock.

actually the question is, is everything correct in this method or is there an error somewhere or is it possible to improve something?

how would you generate such a thing?

Thank you!

update:
hmm, and another option is to just take memcache increment, I don’t know if it works fine for the entire cluster?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

L

lesha_penguin, 2011-09-01
@Lestat

Options for solving uniqid from best to worst:
1) If it is a 64bit bigint, then there are no problems at all: we put a knownly unique machine identifier (for example, ip-address, or crc32/adler32 from hostname) into the upper 32 bits. and the lower 32s are played like a regular sequence.
Advantages: for any identifier, in the case of a “hard debug”, you can find “where the legs grow from” - i.e. uniquely identify the wheelbarrow on which the record with the id under study appeared.
2) If there is a desire to get out in 32bit (a reasonable desire, because not everything works well even in our 64-bit age with large numbers), it is better to use a cached sequence. When requested, the sequence is increased not by 1, but immediately by a large value, for example, by 1000 or 10000. Accordingly, the node, having received the range 320000..329999 from the sequence, can easily not turn to the sequence again until this range is used up. Pros: again, it is possible to log. Disadvantages (albeit eliminated by a backup sequence with a backup range): you have to choose a portion of the return.
3) Extreme option. We also expand integer to 128 bits and use hashes or something uuid-like. The minus is obvious - 99.9% of the software will not be able to work with such a value as with a number.
4) Hardcode option. If you know that there will be no more than N nodes, each node simply spins the sequence S and id gets id=S*N+n; where n is the node number. A bad option, very fraught with bad consequences, if suddenly you made a mistake in bold assessments.
5) The method of trial and repetition. even worse, because it will work if you have few records and they are rarely added, and in general it will work reliably if there is only one source for adding records.

O

Ololesha Ololoev, 2011-09-01
@alexeygrigorev

And if UUID to use?