P
P
pavelodintsov2014-06-30 21:40:46
C++ / C#
pavelodintsov, 2014-06-30 21:40:46

How to create a specific data structure for storing a large number of records by an unsigned 32 bit key?

We need a structure that will allow us to store a value using an unsigned 32-bit integer key with maximum speed and minimum memory loss. The peculiarity of the key is that it lies in the interval from zero to 2 ^ 32, but not strictly sequentially, but in continuous 50-100 blocks (subnetwork apart).
At the same time, the structure should provide the ability to work in multi-threaded mode in the configuration - one writes and the other reads, if possible with minimal use of locks.
Speed ​​requirements ~ 1^-6c. for the record.
Tried std::map, std::unordered_map, boost::unordered_map all good but slow.
Update:
The most efficient thing I came up with is to create the following structure:
typedef vector subnet_counters
It will store from 256 to several million records for each IP in a given network. The network size must be allocated when the structure is initialized. There are no memory allocations during operation.
Then put this structure in its turn already in std::map, using the network name as a key, and the already specified array will lie by the key.
In total, in order to read or write data, you will only need to do the following:
1) Find which network the IP belongs to (networks do NOT intersect)
2) Find the address of the vector storing data on all IPs of this network by key
3) Using the address part of the IP address within the network statically get offset in vector
4) Insert/read value

Answer the question

In order to leave comments, you need to log in

3 answer(s)
T
throughtheether, 2014-06-30
@throughtheether

that it lies in the interval from zero to 2 ^ 32, but not strictly sequentially, but in continuous 50-100 blocks (subnetwork apart).
If these are IPv4 addresses and you need to find the longest-match, then look towards patricia trie . Industrial use case: PySubnetTree (python module, written in C). As for multithreading, I can not tell, unfortunately.

V
vasiliev, 2014-06-30
@vasiliev

As far as I understand, std:::map is implemented via balanced trees; unordered_map - via hash.
According to the wonderful table , we have that in the case of std::map, search / write operations take O (log N) both on average and in the worst cases; in the second case, they take O(1) on average and O(N) at worst. How a hash will work depends a lot on the quality of the hash function. It seems to me that you need to look exactly aside.
Another point related to the hash is the number of "baskets". My implementation of unordered_map has a default value of 10 (set in the constructor), and that implementation seems to reallocate carts dynamically when a certain number of added items is reached. Hereit is shown that a properly initialized unordered_map should work just as fast for insertion as it does for extraction.

M
Misha Krinkin, 2014-07-01
@kmu1990

If it is possible to use Intel TBB, then there is a concurrent_unordered_map. If not possible, then why not implement unordered_map yourself? In the simplest case, you just make a shared_mutex for each pocket, you will need to think about rehashing, but on the other hand, you may not need it, just create a fairly large table from the very beginning.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question