[TOC]

  1. Title: Diffusion Policy Visuomotor Policy Learning via Action Diffusion
  2. Author: Cheng Chi et. al.
  3. Publish Year: 2023
  4. Review Date: Thu, Mar 9, 2023
  5. url: https://diffusion-policy.cs.columbia.edu/diffusion_policy_2023.pdf

Summary of paper

image-20230309193732709

Contribution

  • introducing a new form of robot visuomotor policy that generates behaviour via a “conditional denoising diffusion process” on robot action space

Some key terms

Explicit policy

  • learning this is like imitation learning

Implicit policy

  • aiming to minimise the estimation of the energy function
  • learning this is like a standard reinforcement learning

diffusion policy

  • provide a smooth gradient to refining action over each iteration.

Method

Diffusion policy

  • in this formulation, instead of directly outputting an action, the policy infers the action-score gradient, conditioned on visual observations, for K demonising iterations

Benefits of this approach

  • Expressing multimodal action distributions.
    • diffusion policy can express arbitrary normalizable distributions, which includes multimodal action distributions, a well-known challenge for policy learning
  • High-dimensional output space
    • this property allows the policy to jointly infer a sequence of future actions instead of single-step actions, which is critical for encouraging temporal action consistency
  • Stable training
    • Training energy-based policy (think about reinforcement learning) often requires negative sampling to estimate an intractable normalisation constant, which is known to cause training instability. Diffusion policy might be more stable.