Joseph_kim Collaborative Planning With Encoding of High Level Strategies 2017

please modify the following [TOC] Title: Collaborative Planning with Encoding of Users’ High-level Strategies Author: Joseph Kim et. al. Publish Year: 2017 Review Date: Mar 2022 Summary of paper Motivation Automatic planning is computationally expensive. Greedy search heuristics often yield low-quality plans that can result in wasted resources; also, even in the event that an adequate plan is generated, users may have difficulty interpreting the reason why the plan performs well and trusting it. ...

March 4, 2022 · 2 min · Sukai Huang

Richard_shin Constrained Language Models Yield Few Shot Semantic Parsers 2021

[TOC] Title: Constrained Language models yield few-shot semantic parsers Author: Richard Shin et. al. Publish Year: Nov 2021 Review Date: Mar 2022 Summary of paper Motivation The author wanted to explore the use of large pretrained language models as few-shot semantic parsers However, language models are trained to generate natural language. To bridge the gap, they used language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation. (using synchronous context-free grammar SCFG) ...

March 2, 2022 · 1 min · Sukai Huang

Borja_ibarz Reward Learning From Human Preferences and Demonstrations in Atari 2018

[TOC] Title: Reward learning from human preferences and demonstractions in Atari Author: Borja Ibarz et. al. Publish Year: 2018 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author proposed a method that uses human expert’s annotation rather than extrinsic reward from the environment to guide the reinforcement learning. ...

November 27, 2021 · 2 min · Sukai Huang

Adrien_ecoffet Go Explore a New Approach for Hard Exploration Problems 2021 Paper Review

[TOC] Title: Go-Explore: a New Approach for Hard-Exploration Problems Author: Adrien Ecoffet et. al. Publish Year: 2021 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author hypothesised that there are two main issues that prevent DRL agents from achieving high score in exploration-hard game (e.g., Montezuma’s Revenge) ...

November 27, 2021 · 4 min · Sukai Huang

Adria Badia Agent57 Outperforming the Atari Human Benchmark 2020 Paper Review

[TOC] Title: Agent57: Outperforming the Atari Human Benchmark 2020 Author: Adria Badia et. al. Publish Year: 2020 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. Agent57 is the SOTA Atari RL agent in 2020 that can play difficult Atari games like “Montezuma’s Revenge, “Pitfall”, “Solaris” and “Skiing”. ...

November 18, 2021 · 5 min · Sukai Huang

Stefan O Toole Width Based Lookaheads With Learnt Base Policies and Heuristics Over the Atari 2600 Benchmark 2021 Paper Reivew

[TOC] Title: Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark Author: Stefan O’Toole et. al. Publish Year: 2021 Review Date: Tue 16 Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. This paper proposed a new width-based planning and learning agent that can play Atari-2600 games (though it cannot play Montezuma’s Revenge). The author claimed that width-based planning exploration plus (greedy) optimal MDP policy exploitation is able to achieve better performance than Monte-Carlo Tree Search. ...

November 16, 2021 · 4 min · Sukai Huang