Answer the question
In order to leave comments, you need to log in
We have 4 lists of values of 100 GB. How to get the values included in all lists?
Good afternoon.
There was a question regarding the inverted index involved in the work of search engines.
Let's say we took any 4 lists of values. The lists are huge, the storage form is up to you. Now you need to pull out at least the first 10 values included in each of them. Then ten more, and so on. until there are no values included in only 2 lists.
Values can be in any position of the list, and the search speed is instantaneous. There are thousands of lists and they can be combined in any way.
How do search engine databases cope with such a non-trivial task? Does anyone know the topic well enough to be able to explain it scientifically? Rough and general. Links will also be happy.
Answer the question
In order to leave comments, you need to log in
I could think of only this:
1. When cloning , set id-shniki .
2. Manually set the animation order via jQuery .
What's the problem? If the indexes have already been built (read - the lists are sorted), then simply moving in parallel through all four lists, you can find all the intersections of the lists in one pass.
Let's say we have four lists sorted in ascending order.
0. Add to the end of each list the same element, obviously greater than any element from any list.
1. Put pointers to the first elements of the lists.
2. Find the minimum value of the current elements, denote it by Min.
3. Calculate the number of current elements equal to Min. If it is equal to 4 - display the element.
4. For each list, as long as the value of the current element is Min, move the pointer to the next element.
5. If no pointer has reached the end of the list, then go to step 2.
To search for intersections of three of the four or two of the four lists, correct steps 3 and 5.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question