Reinforcement Learning

Tuomas_haarnoja Soft Actor Critic Off Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor 2018 Paper Review

[TOC] [论文简析]SAC: Soft Actor-Critic Part 1[1801.01290] hat means estimation

Adria Badia Agent57 Outperforming the Atari Human Benchmark 2020 Paper Review

[TOC] Title: Agent57: Outperforming the Atari Human Benchmark 2020 Author: Adria Badia et. al. Publish Year: 2020 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. Agent57 is the SOTA Atari RL agent in 2020 that can play difficult Atari games like “Montezuma’s Revenge, “Pitfall”, “Solaris” and “Skiing”. ...

Stefan O Toole Width Based Lookaheads With Learnt Base Policies and Heuristics Over the Atari 2600 Benchmark 2021 Paper Reivew

[TOC] Title: Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark Author: Stefan O’Toole et. al. Publish Year: 2021 Review Date: Tue 16 Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. This paper proposed a new width-based planning and learning agent that can play Atari-2600 games (though it cannot play Montezuma’s Revenge). The author claimed that width-based planning exploration plus (greedy) optimal MDP policy exploitation is able to achieve better performance than Monte-Carlo Tree Search. ...