Notes CUDA | Divakar Verma

Notes for CUDA

Don’t do this:

 dim3 grid = (1024, 8); // !!!! Wrongly initialised values!!!!

Max of threads in a cuda block: 1024
- This is the total number of threads, not per dimension!
GPU -> many Streaming Multiprocessors (SMs) -> each with a control, Memory and many Cores
- Each ThreadBlock is assigned (fully) to one of the SMs i.e. all the threads in a ThreadBlock go to the same SM
- Threads in the same ThreadBlock can collaborate with each other:
  - Barrier synchronization: __syncthreads()
  - Shared memory
- Threads in different blocks don’t synchronize.
  - Hence, blocks can execute in any order say, sequentially or parallely w.r.t each other
  - Hence, same code can run on a different hardware with few or more SMs accordingly.