[TOC]

  1. Title: Proximal Policy Optimisation Explained Blog
  2. Author: Xiao-Yang Liu; DI engine
  3. Publish Year: May 4, 2021
  4. Review Date: Mon, Dec 26, 2022

Difference between on-policy and off-policy

image-20221226195443427

Question: is there a way to improve the sample efficiency of on-policy algorithms without losing their benefit.

Algorithm

image-20221226200313414

explanation

image-20221226200425376

Generalized advantage estimator (GAE)

image-20221226201313870

total PPO loss

image-20221226201451409