Cheng_chi Diffusion Policy Visuomotor Policy Learning via Action Diffusion 2023

[TOC]

Summary of paper

introducing a new form of robot visuomotor policy that generates behaviour via a “conditional denoising diffusion process” on robot action space

Explicit policy

Implicit policy

diffusion policy

Diffusion policy

in this formulation, instead of directly outputting an action, the policy infers the action-score gradient, conditioned on visual observations, for K demonising iterations

Benefits of this approach

Expressing multimodal action distributions.
- diffusion policy can express arbitrary normalizable distributions, which includes multimodal action distributions, a well-known challenge for policy learning
High-dimensional output space
- this property allows the policy to jointly infer a sequence of future actions instead of single-step actions, which is critical for encouraging temporal action consistency
Stable training
- Training energy-based policy (think about reinforcement learning) often requires negative sampling to estimate an intractable normalisation constant, which is known to cause training instability. Diffusion policy might be more stable.