D
D
denjamin112020-01-02 11:30:12
C++ / C#
denjamin11, 2020-01-02 11:30:12

How to parallelize a ternary for loop with OpenMP?

Hello, I need help with parallelization.

void kernel_heat_3d (int tsteps, int n, double A[ n][n][n], double B[ n][n][n])
{
    int t, i, j, k;

    for (t = 1; t <= TSTEPS; t++) {
        for (i = 1; i < n-1; i++) {
            for (j = 1; j < n-1; j++) {
                for (k = 1; k < n-1; k++) {
                    B[i][j][k] = 0.125 * (A[i+1][j][k] - 2.0 * A[i][j][k] + A[i-1][j][k])
                                 + 0.125 * (A[i][j+1][k] - 2.0 * A[i][j][k] + A[i][j-1][k])
                                 + 0.125 * (A[i][j][k+1] - 2.0 * A[i][j][k] + A[i][j][k-1])
                                 + A[i][j][k];
                }
            }
        }
        for (i = 1; i < n-1; i++) {
           for (j = 1; j < n-1; j++) {
               for (k = 1; k < n-1; k++) {
                   A[i][j][k] = 0.125 * (B[i+1][j][k] - 2.0 * B[i][j][k] + B[i-1][j][k])
                                + 0.125 * (B[i][j+1][k] - 2.0 * B[i][j][k] + B[i][j-1][k])
                                + 0.125 * (B[i][j][k+1] - 2.0 * B[i][j][k] + B[i][j][k-1])
                                + B[i][j][k];
               }
           }
       }
    }

}

Answer the question

In order to leave comments, you need to log in

1 answer(s)
W
Wataru, 2020-01-02
@wataru

In your example, it is most likely enough to parallelize only loops on i.
But, if you have n less than the number of threads, then you can use the collapse directive . You can read more in the documentation (on page 185).
Or you can flatten 3 nested loops into one by hand like so:

int n3 = (n-2)*(n-2)*(n-2);
for (int iteration = 0; iteration < n3; ++iteration) {
  i = iteration / ((n-2)*(n-2)) + 1;
  j = iteration / (n-2) % (n-2) + 1;
  k = iteration % (n-2) + 1;
  // тут идет содержимое трех циклов по i,j,k = 1..n-2
}

This is just a renumbering of all triplets of values. Each triple of indexes i,j,k can be considered as a three-digit number in the (N-2)-ary number system. Therefore, it is possible to decompose each number from 0 to (n-2) ^ 3 into the n-2)-ary number system through / and% and get three indices.
But collapse and, moreover, the manual version will have overhead for calculating indexes. Therefore, it only makes sense to use them if you have n less than the number of threads available.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question