How is OpenMP sorting parallelized here?

A

ahame2021-04-17 11:03:03

C++ / C#

ahame, 2021-04-17 11:03:03

Task: to parallelize sorting bitwise sorting for integers with a simple merge. How is parallelization going on in this code?

void signedRadixSortOmp(int* sortVec, int sizeVec) {
    int shift = 0;
    int s = static_cast<int>(sizeof(int));
#pragma omp parallel shared(sortVec, sizeVec)
    {
        int sizePartVec = 0;
        int numberThreads = omp_get_num_threads();
        sizePartVec = sizeVec / numberThreads;
        int remainder = sizeVec % numberThreads;
        int threadId = omp_get_thread_num();

#pragma omp master
        {
            sizePartVec += remainder;
            remainder = 0;
        }

        int* localOutVec = new int[sizePartVec];
        int* localInVec = new int[sizePartVec];
        int j = 0;
        for (int i = threadId * sizePartVec + remainder;
        i < (threadId + 1) * sizePartVec + remainder; i++, j++) {
            localInVec[j] = sortVec[i];
        }

        std::vector<int> counters(sizeof(int) * 256);
        createCounters(localInVec, counters.data(), sizePartVec);
        int* count;
        for (int i = 0; i < s; i++) {
            count = counters.data() + 256 * i;
            signedRadix(i, sizePartVec, localInVec, localOutVec, count);
            std::swap(localInVec, localOutVec);
        }
        delete[] localOutVec;

#pragma omp master
        {
            for (int i = 0; i < sizePartVec; i++)
                sortVec[i] = localInVec[i];
            shift++;
        }
#pragma omp barrier
#pragma omp critical
        if (threadId != 0) {
            mergeOrderVec(sortVec, shift * sizePartVec + remainder,
        localInVec, sizePartVec);
            shift++;
        }
        delete[] localInVec;
    }
}

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

W

Wataru, 2021-04-17
@ahame

Sort the array pieces in parallel. At the end, the sorted parts are combined with a merge operation (this part is not parallel).
Needless to say, the execution is terrible. The last part, merging, seems to be done in a square at all. Those. this approach is much slower than non-parallel sorting always.
If we already parallel radix sort, then we need to simultaneously count how many in each piece of each digit, then calculate where each digit from each piece should fall, and then shuffle each piece in parallel to the right places.