What datatype/structure to use for fast gap handling?

E

eegmak2021-07-05 20:33:11

Algorithms

eegmak, 2021-07-05 20:33:11

There are elements of the list, each of which belongs to one to ten mathematical segments on the line of integers.
for example:
[1]=[from 1 to 10, from 14 to 15]
[2]=[from 3 to 7, from 14 to 15, from 34 to 37]
I am writing an algorithm that will find all the elements in one of the segments which includes integer
for example for the number 9 this is only the element [1]
for the number 35 this is only the element [2]
for the number 6 these are both elements
for the number 50 there are no
such elements There can be many such elements in the list, are there any ways to optimize the search for membership of an integer to a set of segments of a set of elements?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

W

Wataru, 2021-07-05
@eegmak

An easy-to-implement method: keep the segments in each element sorted and non-intersecting (if two segments intersect, merge them).
Then, when requesting, go through the list and for each element, use the binsearch to find the rightmost segment, the beginning of which is to the left of the request. Check if the request lies in the segment. This will be slightly faster than the naive method, but it will still go through many elements of the list in vain.
If the list is very long and the response is expected to be small, then there is a faster method. But it is difficult to implement. You need to implement a persistent search tree . It can be implemented on the basis of a persistent segment tree. It is such structure in which it is possible to add elements, and to delete them for O(log n). It is also possible to traverse all elements in O(log n + (their number)). In addition, all versions of the tree are saved after each operation, and the total amount of memory will be O(k log n), where k is the number of operations.
This structure will be used to store the pre-computed responses. If all your segments are drawn on one straight line, then it will be divided into O (n) segments, all points of which will give the same answer when asked. We will save all these answers compactly.
We use the scanning line method. Draw all the boundaries of all segments on one straight line, marking them as the beginning or end (and to which element of the list they correspond). If you walk along this straight line from left to right, then events will occur - the segments will open (a new element is added to the answer) or the segments will close (the element will be removed from the answer). By maintaining the current response in a persistent structure, we save a lot of memory. It is convenient to take their coordinates as the beginnings of the segment, and the coordinates of the ends+1 as the end. In this form, all the boundaries of the segments will be points, not numbers.
So, create an array of structures {coordinate, this is the beginning or end, element number}. Sort by coordinate, then by start flag. Then go through it and, when processing the beginning of the segment, add the element number to the persistent tree. When processing the end - remove the element from the tree. Also, before processing each element, write to the response array: {previous coordinate, current coordinate, link to the current version of the persistent tree}, if the previous coordinate is strictly less than the current one. This response array will store all possible segments with different response sets in the form {start coordinate, end coordinate, answer}.
When you have precalculated this array of responses, you can process requests - Find in the array with a binsearch the segment to which the current request point belongs. You need to use binsearch to find the rightmost segment, whose beginning is less than-equal to query-number. Then check that the end coordinate is strictly greater than the request. In this case, output a persistent tree traversal of a known version in response.
This solution requires O(n log n) memory (where n is the number of all segments) and O(n log n) time to pre-calculate and O( log n + (answer)) time to process the response.
A simpler solution, where the responses are also considered a scanline, but stored simply as lists rather than persistent tree versions, may require O(n^2) memory. But it will work faster, of course.