Spiral hashing?

E

extratag2011-12-20 15:04:03

Algorithms

extratag, 2011-12-20 15:04:03

Spiral hashing?

Please help me understand the algorithm of the spiral hashing. Unfortunately, no matter how much I googled clear information on this topic, I could not find it. In Russian, nothing could be found at all. There is a document in English, but with the help of a translator, I could not catch the meaning of it. Please help me figure it out or give an algorithm in Russian.
I am also looking for an implementation of this algorithm in C / C ++.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

[

[email protected]><e, 2011-12-20
@barmaley_exe

The An analysis of Spiral Hashing article linked above provides the following description:

Spiral hashing is a type of hashing proposed by Martin (Martin, GNN, Spiral storage: Incrementally Augmentable Hash Addressed Storage). In this technique, the elements are distributed unevenly among the baskets, so that the elements are predominantly located at one end of the “basket” space. When the load factor (the ratio of the number of items to the number of bins) exceeds the threshold, the first bin, probably the densest one, is split into g bins, where g is the growth factor.
Initially, there are g-1 baskets, numbered from 1 to g-1. Note that the bucket address space looks like this: {1, 2, …, g - 1} = {g ⁰ , …, g ¹- one}. When the threshold is exceeded, the first bin is split into g new bins, which become bins from g to 2g-1. The elements of the first bin are distributed to new bins using a new hash function (the hash function is parameterized). The first basket does not exist now, there are only {2, 3, … 2g-1} = {g ^c , …, g ^c+1 -1}, where c=log _g 2.
In the general case, at the k-th stage (t i.e. after k-1 partitioning) a bin k is selected and divided into g new bins given the numbers kg … g(k+1)-1. Its elements are distributed among new baskets using a new hash function. After k partitions of the first k baskets, we get {g ^c , …, g ^c+1 -1}, where c=log _g(k+1) and the number of baskets is always divisible by g-1.
Let us now describe the hash function H(K, k) that provides a non-uniform distribution. Note that the hash function depends not only on the key K, but is also parameterized by the number of splits k performed. Recall that after k partitions, our address space is {g ^c , …, g ^c+1 -1}, where c=log _g (k+1). We have g ^c (g-1) bins from g ^c to g ^c+1-one. Having received the key K, we first compare it with x from [0, 1). This can be done using, for example, the distribution function of key-value pairs (GD Knott - Hashing functions, The Computer Journal Volume 18 Number 3). Then we map x to a number x' from [c, c+1), evenly distributed. One of the options for such a comparison was proposed by Martin. The value of H(K, k) is defined as [g ^x' ] (rounding off the decimal part). Such a hash function has the property that H(K, k+1) = H(K, k) for all buckets g ^c , g ^c + 1, …, g ^c+1 that exist at stage k, except for bucket g ^c = k+1, which no longer exists at stage k+1. Note that P(H(K, k) = i) is a logarithmically decreasing function of log _g(i+1)-log _g i for g ^c ≤ i ≤ g ^c+1 , hence the name spiral hashing.
If you use open addressing, some items will be stored in other people's baskets. Therefore, when splitting a basket, we will need to look at others to find the elements of this basket. But we don't want to hit too many buckets when splitting, so it's best not to use open addressing to deal with collisions. We prefer the chain method.

Next comes performance analysis and evaluation.
And in the final part it says. that the method is impractical, because works slowly. Including due to expensive calls to the logarithm and exponential functions.

M

MikhailEdoshin, 2011-12-20
@MikhailEdoshin

Well, if you translate the introductory paragraph from here , this is hashing, where the values are not distributed evenly among the "baskets" (buckets), as is usually done with hashing, but more often on one side than on the other. When the number of elements in relation to the number of baskets reaches a certain threshold, the number of baskets, as in other algorithms, increases, but only one basket is broken - the first one on the side where it is denser. This is supposed to be the most complete basket.