How many threads and blocks does it take to multiply 2 matrices on Cuda?

C

C

Clay2019-04-27 22:04:41

CUDA

Clay, 2019-04-27 22:04:41

How many threads and blocks does it take to multiply 2 matrices on Cuda?

Hello, I can’t figure out how many threads and blocks are needed to correctly multiply matrices when calling the kernel?

__global__ void mul(float *a, float *b, float *c, int m, int n, int k)
{
  int row = blockIdx.y * blockDim.y + threadIdx.y;
  int col = blockIdx.x * blockDim.x + threadIdx.x;
  float sum = 0;
  if (col < k && row < m)
  {
    for (int i = 0; i < n; i++)
    {
      sum += a[row * n + i] * b[i * k + col];
    }
    c[row * k + col] = sum;
  }
}

Reply

Answer the question

Answer the question

In order to leave comments, you need to log in

0 answer(s)

Similar questions

A

Alexey Ryazanov2016-02-27 18:42:03

How to test OpenMP capabilities for GPU efficiency? 1Reply

A

Andrey Kobyshev2021-10-17 00:15:07

How to make Pillow.Image.alpha_composite logic run on GPU? 1Reply

N

Nikcet2020-08-10 16:38:56

Can the Nvidia Jetson Nano module be used in commercial development? 1Reply

P

Prizm2020-08-20 23:35:00

How to set up visual studio for cuda? 0Reply

B

BestJS2018-10-24 19:21:07

Why is cuda not installed? 1Reply

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question