How to call the kernel in the kernel?

C

Clay2019-05-01 11:35:14

GPGPU

Clay, 2019-05-01 11:35:14

Hello, I am writing a neural network for digit recognition (for starters) on CUDA. You need to write a function on the device that calls the neural network training function in each block. There should be 60,000 total workouts, and therefore blocks. The training function itself is written by me, but it is also executed on the device. More precisely, not the entire function, but certain code fragments, such as matrix multiplication, are executed on the device. And it turns out that I need to write a function on the device that will call 60,000 network workouts in parallel using blocks, but each workout is also performed on the device (when calling it, you must also specify the grid and blocks). How to implement this?
Thank you very much for any advice/hint/guidance. I'm just learning, and therefore bruising is a common thing for me ...

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vadim Mamonov, 2019-05-18
@dikysa

You cannot call the kernel __global__ function on the GPU, only on the CPU. (Although there may be some way that I do not know, but it is not possible in a CUDA book)
You can leave the data you need in memory and call an additional kernel from the CPU. Or create __device__ functions and call them on the GPU.

J

JackBoner, 2019-07-04
@JackBoner

You can call the kernel in the kernel. This is called dynamic parallelism
https://devblogs.nvidia.com/cuda-dynamic-paralleli...