Is it possible and does it make sense to transfer this code to OpenCL?

K

Kalombyr2019-10-12 11:18:29

GPGPU

Kalombyr, 2019-10-12 11:18:29

Good day!
There is this piece of code:

struct TTour {
   QVector<int> tour;
   double cost;
};

TTour Tournamet::iter(const TTour &currentTour)
{
    int n = currentTour.tour.count();
    TTour bestTour = currentTour;
    for(int i=1; i<n-2; ++i){
        for(int j=i+1; j<n+1; ++j) {
            if(j-i == 1) continue;
            QVector<int> swap = currentTour.tour.mid(0, i);
            QVector<int> l1 = currentTour.tour.mid(i, j-i);
            QVector<int> l2 = currentTour.tour.mid(j);

            std::reverse(l1.begin(), l1.end());

            swap.append(l1);
            swap.append(l2);

            TTour newTour = {swap, cost(swap)};
            if( newTour.cost < bestTour.cost ) bestTour = newTour;
        }
    }
}


double Tournamet::cost(const QVector<int> &points)
{
    double total = 0;
    for(int i=1; i<points.count(); ++i) {
        total += _distancer->dist(points[i-1], points[i]);  //-- Внутри расстояние отдаётся из QHash, каждый раз не вычисляется.
    }    
    return  total;
}

Thought it was time to learn OpenCL to optimize this area.
Actually the question is, is it possible, and most importantly - does it make sense (in terms of parallelization and acceleration) to transfer it?
PS "try it, but you'll know" - I'll definitely try it. But I want to understand why it might be pointless in terms of performance and other things, and it might be better to try on another example...

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

M

mayton2019, 2019-10-12
@Kalombyr

Most likely parallelism will give nothing. The matter is that tasks when
1) Shared nothing are paralleled. There are many processes and they work with their data arrays and then merge the result into a certain result.
2) The data is rummaged but at the same time they are immutable.
In your case, operations such as std::reverse, QVector::mid are used. They break the general data snapshot and do not allow point (2) to come out.
In general, you need to seriously break the algorithm in order to get the creep from parallelism.