Posts

Federico_bianchi Language in a Search Box Grounding Language Learning in Real World Human Machine Interaction 2021

[TOC] Title: Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction Author: Federico Bianchi Publish Year: 2021 Review Date: Jan 2022 Summary of paper the author investigated grounded language learning through the natural interaction between users and the shopping website search engine. How they do it convert the shopping object dataset into a Latent Grounded Domain related products end up closer in the embedding space train the mapping model (mapping from text query to a portion of product space) based on the user click behaviour (In the training dataset, the users queries about “Nike” and the they would click relevant Nike Product) ...

Lili_chen Decision Transformer Reinforcement Learning via Sequence Modeling 2021

[TOC] Title: Decision Transformer: Reinforcement Learning via Sequence Modeling Author: Lili Chen et. al. Publish Year: Jun 2021 Review Date: Dec 2021 Summary of paper The Architecture of Decision Transformer Inputs are reward, observation and action Outputs are action, in training time, the future action will be masked out. I believe this model is able to generate a very good long sequence of actions due to transformer architecture. But somehow this is not RL anymore because the transformer is not trained by reward signal … ...

Jiayuan_mao Grammar Based Grounded Lexicon Learning 2021

[TOC] Title: Grammar-Based Grounded Lexicon Learning Author: Jiayuan Mao Publish Year: 2021 NeurIPS Review Date: Dec 2021 Summary of paper The paper extend the previous work “Neuro-Symbolic Concept Learner” by parsing the natural language questions using symbolic manner. The core semantic parsing technique is Combinatory Categorical Grammar with CKY algorithm to prune unlikely expressions. The full picture looks like this The detailed algorithm process looks like this How to derive concept embedding ...

Julia_kiseleva Interactive Grounded Language Understanding in a Collaborative Environment 2021

[TOC] Title: Interactive Grounded Language Understanding in a Collaborative Environment Author: Julia Kiseleva et. al. Publish Year: 2021 Review Date: Dec 2021 Summary of paper The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. The split the problem into following concrete research questions, which correspond to separate tasks that can be used to study each component individually before joining all of them into one system ...

Dominik_drexler Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches 2021

[TOC] Title: Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches Author: Dominik Drexler et. al. Publish Year: 2021 Review Date: Dec 2021 Summary of paper Algorithms like SIW often fail when the goal is not easily serialisable or when some of the subproblems have a high width. In this work, the author address these limitations by using a simple but powerful language for expressing finer problem decompositions called policy sketches. ...

Yiding_jiang Language as Abstraction for Hierarchical Deep Reinforcement Learning

[TOC] Title: Language as an Abstraction for Hierarchical Deep Reinforcement Learning Author: Yiding Jiang et. al. Publish Year: 2019 NeurIPS Review Date: Dec 2021 Summary of paper Solving complex, temporally-extended tasks is a long-standing problem in RL. Acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. Therefore, they propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalisation ...

Hengyuan_hu Hierarchical Decision Making by Generating and Following Natural Language Instructions 2019

[TOC] Title: Hierarchical Decision Making by Generating and Following Natural Language Instructions Author: Hengyuan Hu et. al. FAIR Publish Year: 2019 Review Date: Dec 2021 Summary of paper One line summary: they build a Architect Builder model to clone human behaviour for playing RTS game Their task environment is very similar to IGLU competition setting, but their model is too task-specific The author mentioned some properties about natural language instructions ...

David_ding Attention Over Learned Object Embeddings Enables Complex Visual Reasoning 2021

Title: Attention Over Learned Object Embeddings Enables Complex Visual Reasoning Author: David Ding et. al. Publish Year: 2021 NeurIPS Review Date: Dec 2021 Background info for this paper: Their paper propose a all-in-one transformer model that is able to answer CLEVRER counterfactual questions with higher accuracy (75.6% vs 46.5%) and less training data (- 40%) They believe that their model relies on three key aspects: self-attention soft-discretization self-supervised learning ...

Jacob_andreas Modular Multitask Reinforcement Learning With Policy Sketches 2017

Title: Modular Multitask Reinforcement Learning with Policy Sketches Author: Jacob Andreas et. al. Publish Year: 2017 Review Date: Dec 2021 Background info for this paper: Their paper describe a framework that is inspired by on options MDP, for which a reinforcement learning task is handled by several sub-MDP modules. (that is why they call it Modular RL) They consider a multitask RL problem in a shared environment. (See the figure below). The IGLU Minecraft challenge as well as Angry Birds also belongs to this category. ...

David_abel on the Expressivity of Markov Reward 2021

[TOC] Title: On the Expressivity of Markov Reward Author: David Abel et. al. Publish Year: NuerIPS 2021 Review Date: 6 Dec 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author found out that in the Markov Decision Process scenario, (i.e., we do not look at the history of the trajectory to provide rewards), some tasks cannot be realised perfectly by reward functions. i.e., ...

Rishabh_agarwal Deep Reinforcement Learning at the Edge of the Stats Precipice 2021

[TOC] Title: Deep Reinforcement Learning at the Edge of the Statistical Precipice Author: Rishabh Agarwal et. al. Publish Year: NeurIPS 2021 Review Date: 3 Dec 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. Most current published results on deep RL benchmarks uses point estimate of aggregate performance such as mean and median score across the task. ...

Borja_ibarz Reward Learning From Human Preferences and Demonstrations in Atari 2018

[TOC] Title: Reward learning from human preferences and demonstractions in Atari Author: Borja Ibarz et. al. Publish Year: 2018 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author proposed a method that uses human expert’s annotation rather than extrinsic reward from the environment to guide the reinforcement learning. ...

Adrien_ecoffet Go Explore a New Approach for Hard Exploration Problems 2021 Paper Review

[TOC] Title: Go-Explore: a New Approach for Hard-Exploration Problems Author: Adrien Ecoffet et. al. Publish Year: 2021 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author hypothesised that there are two main issues that prevent DRL agents from achieving high score in exploration-hard game (e.g., Montezuma’s Revenge) ...

Tuomas_haarnoja Soft Actor Critic Off Policy Maximum Entropy Deep Reinforcement Learning With a Stochastic Actor 2018 Paper Review

[TOC] [论文简析]SAC: Soft Actor-Critic Part 1[1801.01290] hat means estimation

Adria Badia Agent57 Outperforming the Atari Human Benchmark 2020 Paper Review

[TOC] Title: Agent57: Outperforming the Atari Human Benchmark 2020 Author: Adria Badia et. al. Publish Year: 2020 Review Date: Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. Agent57 is the SOTA Atari RL agent in 2020 that can play difficult Atari games like “Montezuma’s Revenge, “Pitfall”, “Solaris” and “Skiing”. ...

Stefan O Toole Width Based Lookaheads With Learnt Base Policies and Heuristics Over the Atari 2600 Benchmark 2021 Paper Reivew

[TOC] Title: Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark Author: Stefan O’Toole et. al. Publish Year: 2021 Review Date: Tue 16 Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. This paper proposed a new width-based planning and learning agent that can play Atari-2600 games (though it cannot play Montezuma’s Revenge). The author claimed that width-based planning exploration plus (greedy) optimal MDP policy exploitation is able to achieve better performance than Monte-Carlo Tree Search. ...

Cristian Paul Bara Mindcraft Theory of Mind Modelling 2021 Paper Review

[TOC] Title: MINDCRAFT: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks Author: Cristian-Paul Bara et. al. Publish Year: 2021 EMNLP Review Date: 12 Nov 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The contribution of this paper is the mind modelling dataset (Using Minecraft environment). ...