[TOC]

  1. Title: Generalized Proximal Policy Optimisation With Sample Reuse 2021
  2. Author: James Queeney et. al.
  3. Publish Year: 29 Oct 2021
  4. Review Date: Wed, Dec 28, 2022

Summary of paper

Motivation

image-20221228140752324

Contribution

Some key terms

sample complexity

PPO theoretical guarantee

reuse samples

On-policy policy improvement methods

sample efficiency with off-policy data

combining on-policy and off-policy

state visitation distribution

image-20221228163740769

Policy improvement lower bound

image-20221228164207986

PPO why this is called proximal

Generalised policy improvement lower bound

image-20221228201528123

The expected total variation distance between the current policy $\pi_k$ and the future policy pi

Sample efficiency analysis (IMPORTANT)

Potential future work

give you some insight about how to theoretically analyse the learning efficiency