space

gpu, cuda

Prerequisites

  1. C Programming Language, 2nd Edition
  2. GPU Computing
  3. Parallel Computing Stanford CS149
  4. GPU Programming by Simon Oz

CUDA

  1. CUDA C++ Programming Guide
  2. Parallel computing using C
  3. CUDA tutorial code samples
  4. CUDA book archive by NVIDIA
  5. UIUC CUDA course
  6. Programming in Parallel with CUDA (personal todo: ch6 & ch11)
  7. Optimize a CUDA Matmul Kernel for cuBLAS-like Performance
  8. Techniques from AMD $100K kernel competition:

Triton

  1. Triton docs
  2. k resources repo by remek
  3. Practioner guide to Triton
  4. Triton internals
  5. Reverse engineering Triton to CUDA

Misc

  1. ThunderKittens and starter guide
  2. TileLang
  3. GPU Glossary
  4. GPU goes brr (nice blog on gpu architecture)
  5. How to Accurately Time CUDA Kernels in Pytorch
  6. How cuda programming works
  7. Outperforming cuBLAS on H100
  8. Memory Coalescing and Tiled Matrix Multiplication
  9. Tensor core programming
  10. CUDA MatMul

GH

  1. 100 days of building GPU kernels by hamdi
  2. 120 days of cuda
  3. cuda challenge by 1y33
  4. leetcuda