Borja_ibarz Reward Learning From Human Preferences and Demonstrations in Atari 2018

[TOC] Title: Reward learning from human preferences and demonstractions in Atari Author: Borja Ibarz et. al. Publish Year: 2018 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author proposed a method that uses human expert’s annotation rather than extrinsic reward from the environment to guide the reinforcement learning....

<span title='2021-11-27 19:14:04 +1100 AEDT'>November 27, 2021</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Adrien_ecoffet Go Explore a New Approach for Hard Exploration Problems 2021 Paper Review

[TOC] Title: Go-Explore: a New Approach for Hard-Exploration Problems Author: Adrien Ecoffet et. al. Publish Year: 2021 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author hypothesised that there are two main issues that prevent DRL agents from achieving high score in exploration-hard game (e....

<span title='2021-11-27 18:58:32 +1100 AEDT'>November 27, 2021</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;Sukai Huang

Tuomas_haarnoja Soft Actor Critic Off Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor 2018 Paper Review

[TOC] [论文简析]SAC: Soft Actor-Critic Part 1[1801.01290] hat means estimation

<span title='2021-11-18 12:08:53 +1100 AEDT'>November 18, 2021</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

Adria Badia Agent57 Outperforming the Atari Human Benchmark 2020 Paper Review

[TOC] Title: Agent57: Outperforming the Atari Human Benchmark 2020 Author: Adria Badia et. al. Publish Year: 2020 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. Agent57 is the SOTA Atari RL agent in 2020 that can play difficult Atari games like “Montezuma’s Revenge, “Pitfall”, “Solaris” and “Skiing”....

<span title='2021-11-18 12:05:47 +1100 AEDT'>November 18, 2021</span>&nbsp;·&nbsp;5 min&nbsp;·&nbsp;Sukai Huang

Stefan O Toole Width Based Lookaheads With Learnt Base Policies and Heuristics Over the Atari 2600 Benchmark 2021 Paper Reivew

[TOC] Title: Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark Author: Stefan O’Toole et. al. Publish Year: 2021 Review Date: Tue 16 Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. This paper proposed a new width-based planning and learning agent that can play Atari-2600 games (though it cannot play Montezuma’s Revenge)....

<span title='2021-11-16 17:40:10 +1100 AEDT'>November 16, 2021</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;Sukai Huang

Cristian Paul Bara Mindcraft Theory of Mind Modelling 2021 Paper Review

[TOC] Title: MINDCRAFT: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks Author: Cristian-Paul Bara et. al. Publish Year: 2021 EMNLP Review Date: 12 Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The contribution of this paper is the mind modelling dataset (Using Minecraft environment)....

<span title='2021-11-12 12:56:24 +1100 AEDT'>November 12, 2021</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang