P
P
Perzh2014-05-02 19:03:50
GPGPU
Perzh, 2014-05-02 19:03:50

How to properly accelerate a program on the GPU using C++ AMP

Hello.
I'm trying to implement EMMSP(extrapolation model most similar pattern) on GPU using C++ AMP.
The essence of the algorithm is to enumerate various fixed-length subsequences from a time series. Simple enumeration, simply parallelized, threads work independently of each other. Data (float array from 18k to 1M) loaded into shared memory. Because each thread works with its own small piece of the array, I wanted to load data from global memory into faster block memory, which is not available to all processors, but only to those that are on the same block. However, this did not affect the running time of the program, despite the fact that, according to the literature, block memory works hundreds of times faster than global memory and is used repeatedly in data processing.
I have two explanations for this situation: 1. I am not using block memory correctly 2. each thread calculates regression, correlation between two pieces of the array, while, obviously, cycles are used, and hence conditional operators, which have a bad effect on performance when working with GPU.
Question : plz tell me what could be the reason: 1, 2 or both, or something third?
PS: if there are those who know C ++ AMP, please write back, I will attach a piece of code, as simplified as possible, but demonstrating the essence and the tools I use.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question