Proximal Policy Optimisation Explained Blog

[TOC] Title: Proximal Policy Optimisation Explained Blog Author: Xiao-Yang Liu; DI engine Publish Year: May 4, 2021 Review Date: Mon, Dec 26, 2022 Highly recommend reading this blog https://lilianweng.github.io/posts/2018-04-08-policy-gradient/ https://zhuanlan.zhihu.com/p/487754664 Difference between on-policy and off-policy For on-policy algorithms, they update the policy network based on the transitions generated by the current policy network. The critic network would make a more accurate value-prediction for the current policy network in common environments. For off-policy algorithms, they allow to update the current policy network using the transitions from old policies. Thus, the old transitions could be reutilized, as shown in Fig. 1 the points are scattered on trajectories that are generated by different policies, which improves the sample efficiency and reduces the total training steps. Question: is there a way to improve the sample efficiency of on-policy algorithms without losing their benefit. PPO solves the problem of sample efficiency by utilizing surrogate objectives to avoid the new policy changing too far from the old policy. The surrogate objective is the key feature of PPO since it both 1. regularizes the policy update and enables the 2. reuse of training data. Algorithm ...

<span title='2022-12-26 19:50:35 +1100 AEDT'>December 26, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;196 words&nbsp;·&nbsp;Sukai Huang
architecture

Alex_petrekno Sample Factory Asynchronous Rl at Very High Fps 2020

[TOC] Title: Sample Factory: Asynchronous Rl at Very High FPS Author: Alex Petrenko Publish Year: Oct, 2020 Review Date: Sun, Sep 25, 2022 Summary of paper Motivation Identifying performance bottlenecks RL involves three workloads: environment simulation inference backpropagation overall performance depends on the lowest workload In existing methods (A2C/PPO/IMPALA) the computational workloads are dependent -> under-utilisation of the system resources. Existing high-throughput methods focus on distributed training, therefore introducing a lot of overhead such as networking serialisation, etc. ...

<span title='2022-09-25 16:34:09 +1000 AEST'>September 25, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;154 words&nbsp;·&nbsp;Sukai Huang
multimodal framework

Sergios_karagiannakos Vision Language Models Towards Multimodal Dl 2022

[TOC] Title: Vision Language Models Towards Multimodal Deep Learning Author: Sergios Karagiannakos Publish Year: 03 Mar 2022 Review Date: Tue, Aug 9, 2022 https://theaisummer.com/vision-language-models/

<span title='2022-08-09 07:37:30 +1000 AEST'>August 9, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;24 words&nbsp;·&nbsp;Sukai Huang