[TOC]

  1. Title: Learning Generative Models With Goal Conditioned Reinforcement Learning
  2. Author: Mariana Vargas Vieyra et. al.
  3. Publish Year: 26 Mar 2023
  4. Review Date: Thu, Mar 30, 2023
  5. url: https://arxiv.org/abs/2303.14811

Summary of paper

image-20230330212401597

Contribution

  • we present a novel framework for learning generative models with goal-conditioned reinforcement learning
  • we define two agents, a goal conditioned agent (GC-agent) and a supervised agent (S-agent)
  • Given a user-input initial state, the GC-agent learns to reconstruct the training set. In this context, elements in the training set are the goals.
    • during training, the S-agent learns to imitate the GC-agent while remaining agnostic of the goals
    • At inference we generate new samples with S-agent.

Some key terms

Goal-Conditioned Reinforcement Learning (GCRL) framework

  • in GCRL the agent aims at a particular state called the goal.
  • at each step, the environment yields a loss that accounts how far the agent is from the goal it is targeting (potential based reward shaping)
  • intuitively, our learning procedure consists of training a family of goal-conditioned agent (GC-policy) that learn to reach the different elements in the training set by producing a trajectory of intermediate representations, departing from the fixed initial state (the alternative view for reverse diffusion process)
  • at the same time, we obtain the generative model by learning a mixture policy of these goal-conditioned policies where the goal is sampled uniformly at random from the training set.
  • the goal-conditioned agents are used for training only. At inference time, we generate trajectories with the mixture policy and collect the states reached at the final step.

Algorithm

image-20230330233528217

  • S-agent generates proposal
  • GC-agent selects proposal as action selection

what is proposal

  • is the action vector
  • But in this setting, the agent need to select the best proposal among $A$ proposals and set $A$ changes for each step.
  • Therefore, we traded a GCRL task with continuous action space for a non-stationary GCRL task with discrete action space.

Potential future work

The paper redefined the diffusion model as a goal-conditioned RL problem