[TOC]

  1. Title: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
  2. Author: Alex Nichol et. al.
  3. Publish Year: Dec 2021
  4. Review Date: Jan 2022

Summary of paper

In author’s previous work, the diffusion model can achieve photorealism in the class-conditional setting by augmenting with classifier guidance, a technique which allows diffusion models to condition on a classifier’s labels.

classifier details

Algorithm of previous work (classifier guided sampling)

image-20220113004351499

So this work extend label classifier to Natural language

Some key terms

diffusion model

diffusion models sample from a distribution by reversing a gradual noising process

Theory behind diffusion model

Improvements

  1. Rather than model the denoised image’s mean, it predict the noise itself.
  2. fix the covariance of the gaussian distribution is good enough
    1. further more, learning a interpolation covariance parameter is even better than fixed one
      1. image-20220113001238307
  3. give different weight to gradient at different step t (the first few steps are much more important)
    1. image-20220113002014975

Good things about the paper (one paragraph)

Major comments

Minor comments

Check the paper reading video:

Incomprehension

Potential future work

“text to image” task has some similarity with “text to goal state/desired action” task for Text-based RL environment.

diffusion model and denoising process share some similarity with the “training with masking” idea.