Proximal Policy Optimisation Explained Blog
[TOC] Title: Proximal Policy Optimisation Explained Blog Author: Xiao-Yang Liu; DI engine Publish Year: May 4, 2021 Review Date: Mon, Dec 26, 2022 Highly recommend reading this blog https://lilianweng.github.io/posts/2018-04-08-policy-gradient/ https://zhuanlan.zhihu.com/p/487754664 Difference between on-policy and off-policy For on-policy algorithms, they update the policy network based on the transitions generated by the current policy network. The critic network would make a more accurate value-prediction for the current policy network in common environments. For off-policy algorithms, they allow to update the current policy network using the transitions from old policies....