C
C
Clay2019-04-27 22:04:41
CUDA
Clay, 2019-04-27 22:04:41

How many threads and blocks does it take to multiply 2 matrices on Cuda?

Hello, I can’t figure out how many threads and blocks are needed to correctly multiply matrices when calling the kernel?

__global__ void mul(float *a, float *b, float *c, int m, int n, int k)
{
  int row = blockIdx.y * blockDim.y + threadIdx.y;
  int col = blockIdx.x * blockDim.x + threadIdx.x;
  float sum = 0;
  if (col < k && row < m)
  {
    for (int i = 0; i < n; i++)
    {
      sum += a[row * n + i] * b[i * k + col];
    }
    c[row * k + col] = sum;
  }
}

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question