Sorting by a set of keys calculated from the element itself. How to minimize the number of calculations and memory consumption?

N

nirvimel2016-04-28 16:07:21

Python

nirvimel, 2016-04-28 16:07:21

This is a real challenge that I had when writing a single filter for images. All morning I tried to isolate it from the context of my applied task and clearly formulate it.
It is required to implement a function that takes as input an array of N elements (the type is not important) and an array of M functions that can be calculated from these elements and return integer values. The function to be implemented must return an array of the indexes of the elements (from the input array) in the sorted array, no actual moving of the elements from place to place is required. Sorting goes on a set of keys, which are integer values returned from the corresponding functions. If for two different elements the first keys (that is, the values returned by the first function) are the same, then their order is determined by the second keys (the values returned by the second function), and so on. It makes no sense to calculate the same function twice from the same element. It also makes no sense to calculate which are not necessary for sorting (actually only the first function will be evaluated for all elements anyway). Plus, the memory is not rubber, its consumption should be reduced as much as possible.
I come to the conclusion that in the worst case the memory consumption will be O(N * M), which I really do not like. Who thinks that you can get by with smaller volumes, it will be very interesting to hear ideas. I also don't like the need for frequent calls to dynamic memory allocation.
As for the algorithm itself: some ideas are spinning in my head about building binary trees at each sorting level, then the final horizontal pass through the entire "forest" to form the output array of indices. But lookups/inserts in b-trees require O(log N) and I need O(1) like for linear arrays. But with linear arrays, I can't save memory and end up with N*M in size even for the best case, and I'm sure you can get by with much smaller sizes except in the worst case.
Perhaps I'm overcomplicating things, and there is some simple and obvious algorithm.
Dear colleagues, share your thoughts on the subject.
PS: The language in the tags is conditional.

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

A

AtomKrieg, 2016-04-29
@AtomKrieg

There is a suggestion to cache the values of the key calculation functions in some HashMap<Pair<N,M>, key_value>. Should work if the functions take longer to compute than fiddling with the cache.
Sample code:
https://gist.github.com/AtomKrieg/6fb6d3d8ae80842e...

E

Evgeny Bykov, 2016-04-29
@bizon2000

I think that the corresponding algorithm in pseudocode can be depicted something like this: Let's
create an array of N elements of type Node and initialize it:

class Node {
    int group;
    int key;
    int index;
}

Node[N] nodes;

for (int i = 0; i < N; i++) {
    nodes[i].group = 0;
    nodes[i].index = i;
}

The sequence of elements with the same value of the group field will be referred to as a group. Thus, after initialization, we have one unsorted group of N elements.
Now we will sort this array in a loop by functions, in each iteration we will tighten the order in groups that contain more than one element, i.e., not yet sorted:

for (int j = 0; j < M; j++) {
    // для каждой группы, содержащей более одного элемента
        // вычисляем значение ключа в каждом элементе этой группы
        // сортируем на месте элементы этой группы по этому ключу
        // разбиваем группу на подгруппы с одинаковым значением ключа, присваиваем подгруппам уникальные номера
    // если все группы содержат ровно по одному элементу, то досрочный выход из цикла
}

The essence of the algorithm is that after the j-th iteration, the nodes array is divided into groups containing exactly one element each, and groups containing elements that cannot be distinguished using only functions with numbers less than j. At the same time, the groups themselves are ordered among themselves.
In my opinion, it is obvious that there are no extra calls to the key calculation functions in this algorithm.
After the loop is completed, all elements in the nodes array are sorted, and the value of the index field is used to refer to the elements of the original array.

K

kstyle, 2015-05-19
@kstyle

switch to Russian. on the main page there is a link View payments on the top right. Next, the Payment Settings and Payee Profile menu will appear on the left.

M

myfirepukan, 2015-05-19
@myfirepukan

In Rapid, open an account on your mobile phone, go to the bank with a passport, pay 150 rubles for verifying your account in Rapid. Everything now adsense brings you to rapid and you from rapid to any visa or mastercard (you can even throw it at someone else's).

V

Vladimir Yakovlev, 2015-12-14
@simplyv

I'm through Rapida, and from there to the card or Yandex.Money.
In Russia, only through Rapida.