Gumbel Softmax
Problem
In DALLE, they wanted to maximize and by ELBO, they derive the lower bound to be:
Here is a sample from the categorical distribution , which is a discrete variable. This is analogous to a VAE, where is a latent, but in a VAE the latent space is continuos and in DALLE's case its discrete. The problem is that the sampling operation is not differentiable, so we cannot backpropagate through it. To solve this, we can use the same reparametrization trick used in VAEs, so they use the Gumbel Softmax trick. This defines our problem: We want to sample from a dimensional categorical distribution with unnormalized probabilities and allow gradients to flow through.
Gumbel Softmax
The Gumbel Softmax heavily utilizes the results from (Maddison et al), where the authors present the Concrete distribution: , where is the simplex, which is the set of all dimensional vectors with non-negative entries that sum to 1. The Concrete distribution is a continuous relaxation of the categorical distribution. Intuitively, we want to sample from the vertices of this simplex based on our categorical distribution. The authors define the Concrete distribution as:
where and temperature . Based on this, the joint distribution over the simplex is:
Footnotes
-
Zero-Shot Text-to-Image Generation by Aditya Ramesh et al. OpenAI 2021
-
The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables ICLR 2017