GPU
•2 min read
Prerequisites
- C Programming Language, 2nd Edition
- GPU Computing
- Parallel Computing Stanford CS149
- GPU Programming by Simon Oz
CUDA
- CUDA C++ Programming Guide
- Parallel computing using C
- CUDA tutorial code samples
- CUDA book archive by NVIDIA
- UIUC CUDA course
- Programming in Parallel with CUDA (personal todo: ch6 & ch11)
- Optimize a CUDA Matmul Kernel for cuBLAS-like Performance
- Techniques from AMD $100K kernel competition:
Triton
- Triton docs
- k resources repo by remek
- Practioner guide to Triton
- Triton internals
- Reverse engineering Triton to CUDA
Misc
- ThunderKittens and starter guide
- TileLang
- GPU Glossary
- GPU goes brr (nice blog on gpu architecture)
- How to Accurately Time CUDA Kernels in Pytorch
- How cuda programming works
- Outperforming cuBLAS on H100
- Memory Coalescing and Tiled Matrix Multiplication
- Tensor core programming
- CUDA MatMul