The Gumbel-Softmax Trick: Making Discrete Sampling Differentiable
2025.06.17
DeepSeek's customised CUDA PTX instruction
2025.03.06