Is it true that OMP slows down CUDA?

V

Vadim Mamonov2019-05-25 14:46:34

GPGPU

Vadim Mamonov, 2019-05-25 14:46:34

In general, the problem is this.
There is a banal code

****
double time = MPI_Wtime();
cudaMemcpy(*****)
cout << " time = " << MPI_Wtime() - time << endl;

I compile nvcc with mpicxx -fopenmp I
set the variable OMP_NUM_THREADS = 1
I measured the time.
Next I set the variable OMP_NUM_THREADS = 2
I measured the time, it got worse.
I even tried to write omp_set_num_treads(1) before cudaMemcpy -> The same time didn't help....
And so on, the more I set OMP_NUM_THREADS, the longer it gets in time...
And I also use cublas, cusparse, and calls in some places of the code noticed that they are also slowing down ...
Does anyone know the reason or have come across this?