Joseph_kim Collaborative Planning With Encoding of High Level Strategies 2017

please modify the following [TOC] Title: Collaborative Planning with Encoding of Users’ High-level Strategies Author: Joseph Kim et. al. Publish Year: 2017 Review Date: Mar 2022 Summary of paper Motivation Automatic planning is computationally expensive. Greedy search heuristics often yield low-quality plans that can result in wasted resources; also, even in the event that an adequate plan is generated, users may have difficulty interpreting the reason why the plan performs well and trusting it....

<span title='2022-03-04 12:12:27 +1100 AEDT'>March 4, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Mikayel_samvelyan Minihack the Planet a Sandbox for Open Ended Rl Research 2021

[TOC] Title: MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research Author: Mikayel Samvelyan et. al. Publish Year: Nov 2021 Review Date: Mar 2022 Summary of paper They presented MiniHack, an easy-to-use framework for creating rich and varied RL environments, as well as a suite of tasks developed using this framework. Built upon NLE and the des-file format, MiniHack enables the use of rich entities and dynamics from the game of NetHack to create a large variety of RL environments for targeted experimentation, while also allowing painless scaling-up of the difficulty of existing environments....

<span title='2022-03-04 12:11:55 +1100 AEDT'>March 4, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

Richard_shin Constrained Language Models Yield Few Shot Semantic Parsers 2021

[TOC] Title: Constrained Language models yield few-shot semantic parsers Author: Richard Shin et. al. Publish Year: Nov 2021 Review Date: Mar 2022 Summary of paper Motivation The author wanted to explore the use of large pretrained language models as few-shot semantic parsers However, language models are trained to generate natural language. To bridge the gap, they used language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation....

<span title='2022-03-02 00:19:18 +1100 AEDT'>March 2, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

Heinrich_kuttler the Nethack Learning Environment 2020

[TOC] Title: The NetHack Learning Environment Author: Heinrich Kuttler et. al. Publish Year: Dec 2020 Review Date: Mar 2022 Summary of paper The NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience....

<span title='2022-03-02 00:18:35 +1100 AEDT'>March 2, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

Pashootan_vaezipoor Ltl2action Generalising Ltl Instructions for Multi Task Rl 2021

please modify the following [TOC] Title: LTL2Action: Generalizing LTL Instructions for Multi-Task RL Author: Pashootan Vaezipoor et. al. Publish Year: 2021 Review Date: March 2022 Summary of paper Motivation they addressed the problem of teaching a deep reinforcement learning agent to follow instructions in multi-task environments. Instructions are expressed in a well-known formal language – linear temporal logic (LTL) Limitation of the vanilla MDP temporal constraints cannot be expressed as rewards in MDP setting and thus modular policy and other stuffs are not able to obtain maximum rewards....

<span title='2022-03-01 20:53:10 +1100 AEDT'>March 1, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

Roma_patel Learning to Ground Language Temporal Logical Form 2019

[TOC] Title: Learning to Ground Language to Temporal Logical Form Author: Roma Patel et. al. Publish Year: 2019 Review Date: Feb 2022 Summary of paper Motivation natural language commands often exhibits sequential (temporal) constraints e.g., “go through the kitchen and then into the living room”. But this constraints cannot be expressed in the reward of Markov Decision Process setting. (see this paper) Therefore, they proposed to ground language to Linear Temporal logic (LTL) and after that continue to map from LTL expressions to action sequences....

<span title='2022-02-28 21:40:53 +1100 AEDT'>February 28, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Thang_m_pham Out of Order How Important Is the Sequential Order of Words in a Sentence in Natural Language Understanding Tasks 2021

[TOC] Title: Out of Order: How Important Is The Sequential Order of Words in a Sentence in Natural Language Understanding Tasks? Author: Thang M. Pham Publish Year: Jul 2021 Review Date: Feb 2022 Summary of paper The author found out that BERT-based models trained on GLUE have low sensitivity to word orders. The research questions are the following Do BERT-based models trained on GLUE care about the order of words in a sentence?...

<span title='2022-02-28 18:58:52 +1100 AEDT'>February 28, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Anton_belyy Guided K Best Selection for Semantic Parsing Annotation 2021

[TOC] Title: Guided K-best Selection for Semantic Parsing Annotation Author: Anton Belyy et. al. Publish Year: 2021 Review Date: Feb 2022 Summary of paper Motivation They wanted to tackle the challenge of efficient data collection (data annotation) for the conversational semantic parsing task. In the presence of little available training data, they proposed human-in-the-loop interfaces for guided K-best selection, using a prototype model trained on limited data. Result Their user studies showed that the keyword searching function combined with a keyword suggestion method strikes the balance between annotation accuracy and speed...

<span title='2022-02-23 19:42:39 +1100 AEDT'>February 23, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

S_teufel Argumentative Zoning 2000

[TOC] Title: Argumentative Zoning Author: Simone Teufel Publish Year: 2000 Review Date: Feb 2022 https://www.cl.cam.ac.uk/~sht25/az.html Summary Abstract We present a new type of analysis for scientific text which we call Argumentative Zoning. We demonstrate that this type of text analysis can be used for generating user-tailored and task-tailored summaries and for performing more informative citation analyses. We also demonstrate that our type of analysis can be applied to unrestricted text, both automatically and by humans....

<span title='2022-02-16 14:40:57 +1100 AEDT'>February 16, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Jacob_andreas Compositionality as Lexical Symmetry 2022

[TOC] Title: Compositionality as Lexical Symmetry Author: Ekin Akyurek; Jacob Andreas Publish Year: Jan 2022 Review Date: Feb 2022 Summary of paper Motivation Standard deep network models lack the inductive bias needed to generalize compositionally in tasks like semantic parsing, translation, and question answering. So, a large body of work in NLP seeks to overcome this limitation with new model architectures that enforce a compositional process of sentence interpretation. Goal...

<span title='2022-02-08 14:20:19 +1100 AEDT'>February 8, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Tao_lei When Attention Meets Fast Recurrence Training Language Models With Reduced Compute 2021

[TOC] Title: When Attention Meets Fast Recurrence: Training Language Models with Reduce Compute Author: Tao Lei Publish Year: Sep 2021 Review Date: Jan 2022 Summary of paper As the author mentioned, the inspiration of SRU++ comes from two lines of research: paralleization / speed problem of Original RNN leveraging recurrence in conjunction with self-attention Structure of SRU++ New discovery :little attention is needed given recurrence. Similar to the observation of Merity (2019), they found using a couple of attention layers sufficient to obtain SOTA results....

<span title='2022-01-14 00:26:37 +1100 AEDT'>January 14, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

Alex_nichol Glide Towards Photorealistic Image Generation and Editing With Text Guided Diffusion Models 2021

[TOC] Title: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Author: Alex Nichol et. al. Publish Year: Dec 2021 Review Date: Jan 2022 Summary of paper In author’s previous work, the diffusion model can achieve photorealism in the class-conditional setting by augmenting with classifier guidance, a technique which allows diffusion models to condition on a classifier’s labels. The classifier is first trained on noised images, and during the diffusion sampling process, gradients from the classifier are used to guide the output sample towards the label....

<span title='2022-01-12 16:54:01 +1100 AEDT'>January 12, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Junyang_lin M6 a Chinese Multimodal Pretrainer 2021

[TOC] Title: M6: A Chinese Multimodal Pretrainer Author: Junyang Lin et. al. Publish Year: May 2021 Review Date: Jan 2022 Summary of paper This paper re-emphasises that large model trained on big data have extremely large capacity and it can outperform the SOTA in downstream tasks especially in the zero-shot setting. So, the author trained a big multi-modal model Also, they proposed a innovative way to tackle downstream tasks. they use masks to block cross attention between tokens so as to fit different types of downstream task Key idea: mask tokens during cross attention so as to solve certain tasks Overview...

<span title='2022-01-12 13:38:14 +1100 AEDT'>January 12, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

Tianshi_cao Babyai Plus Plus Towards Grounded Language Learning Beyond Memorization 2020

[TOC] Title: BABYAI++: Towards Grounded-Language Learning Beyond Memorization Author: Tianshi Cao et. al. Publish Year: 2020 ICLR Review Date: Jan 2022 Summary of paper The paper introduced a new RL environment BabyAI++ that can investigate whether RL agents can extract knowledge from descriptive text and eventually increase generalisation performance. BabyAI++ environment example the descriptive text describe the feature of the object. notice that the feature of object can easily change as we change the descriptive text....

<span title='2022-01-03 22:38:40 +1100 AEDT'>January 3, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

Federico_bianchi Language in a Search Box Grounding Language Learning in Real World Human Machine Interaction 2021

[TOC] Title: Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction Author: Federico Bianchi Publish Year: 2021 Review Date: Jan 2022 Summary of paper the author investigated grounded language learning through the natural interaction between users and the shopping website search engine. How they do it convert the shopping object dataset into a Latent Grounded Domain related products end up closer in the embedding space train the mapping model (mapping from text query to a portion of product space) based on the user click behaviour (In the training dataset, the users queries about “Nike” and the they would click relevant Nike Product)...

<span title='2022-01-03 16:51:39 +1100 AEDT'>January 3, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;Sukai Huang

Lili_chen Decision Transformer Reinforcement Learning via Sequence Modeling 2021

[TOC] Title: Decision Transformer: Reinforcement Learning via Sequence Modeling Author: Lili Chen et. al. Publish Year: Jun 2021 Review Date: Dec 2021 Summary of paper The Architecture of Decision Transformer Inputs are reward, observation and action Outputs are action, in training time, the future action will be masked out. I believe this model is able to generate a very good long sequence of actions due to transformer architecture. But somehow this is not RL anymore because the transformer is not trained by reward signal …...

<span title='2021-12-24 23:29:49 +1100 AEDT'>December 24, 2021</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Jiayuan_mao Grammar Based Grounded Lexicon Learning 2021

[TOC] Title: Grammar-Based Grounded Lexicon Learning Author: Jiayuan Mao Publish Year: 2021 NeurIPS Review Date: Dec 2021 Summary of paper The paper extend the previous work “Neuro-Symbolic Concept Learner” by parsing the natural language questions using symbolic manner. The core semantic parsing technique is Combinatory Categorical Grammar with CKY algorithm to prune unlikely expressions. The full picture looks like this The detailed algorithm process looks like this How to derive concept embedding...

<span title='2021-12-22 17:22:15 +1100 AEDT'>December 22, 2021</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Julia_kiseleva Interactive Grounded Language Understanding in a Collaborative Environment 2021

[TOC] Title: Interactive Grounded Language Understanding in a Collaborative Environment Author: Julia Kiseleva et. al. Publish Year: 2021 Review Date: Dec 2021 Summary of paper The primary goal of the competition is to approach the problem of how to build interactive agents that learn to solve a task while provided with grounded natural language instructions in a collaborative environment. The split the problem into following concrete research questions, which correspond to separate tasks that can be used to study each component individually before joining all of them into one system...

<span title='2021-12-22 15:10:56 +1100 AEDT'>December 22, 2021</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

Dominik_drexler Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches 2021

[TOC] Title: Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches Author: Dominik Drexler et. al. Publish Year: 2021 Review Date: Dec 2021 Summary of paper Algorithms like SIW often fail when the goal is not easily serialisable or when some of the subproblems have a high width. In this work, the author address these limitations by using a simple but powerful language for expressing finer problem decompositions called policy sketches....

<span title='2021-12-17 13:07:53 +1100 AEDT'>December 17, 2021</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

Yiding_jiang Language as Abstraction for Hierarchical Deep Reinforcement Learning

[TOC] Title: Language as an Abstraction for Hierarchical Deep Reinforcement Learning Author: Yiding Jiang et. al. Publish Year: 2019 NeurIPS Review Date: Dec 2021 Summary of paper Solving complex, temporally-extended tasks is a long-standing problem in RL. Acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. Therefore, they propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalisation They present their framework for training a 2-layer hierarchical policy with compositional language as the abstraction between the high-level policy and low-level policy....

<span title='2021-12-15 19:49:28 +1100 AEDT'>December 15, 2021</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

Hengyuan_hu Hierarchical Decision Making by Generating and Following Natural Language Instructions 2019

[TOC] Title: Hierarchical Decision Making by Generating and Following Natural Language Instructions Author: Hengyuan Hu et. al. FAIR Publish Year: 2019 Review Date: Dec 2021 Summary of paper One line summary: they build a Architect Builder model to clone human behaviour for playing RTS game Their task environment is very similar to IGLU competition setting, but their model is too task-specific The author mentioned some properties about natural language instructions...

<span title='2021-12-15 13:11:05 +1100 AEDT'>December 15, 2021</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;Sukai Huang

David_ding Attention Over Learned Object Embeddings Enables Complex Visual Reasoning 2021

Title: Attention Over Learned Object Embeddings Enables Complex Visual Reasoning Author: David Ding et. al. Publish Year: 2021 NeurIPS Review Date: Dec 2021 Background info for this paper: Their paper propose a all-in-one transformer model that is able to answer CLEVRER counterfactual questions with higher accuracy (75.6% vs 46.5%) and less training data (- 40%) They believe that their model relies on three key aspects: self-attention soft-discretization self-supervised learning What is self-attention...

<span title='2021-12-15 12:59:07 +1100 AEDT'>December 15, 2021</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

Jacob_andreas Modular Multitask Reinforcement Learning With Policy Sketches 2017

Title: Modular Multitask Reinforcement Learning with Policy Sketches Author: Jacob Andreas et. al. Publish Year: 2017 Review Date: Dec 2021 Background info for this paper: Their paper describe a framework that is inspired by on options MDP, for which a reinforcement learning task is handled by several sub-MDP modules. (that is why they call it Modular RL) They consider a multitask RL problem in a shared environment. (See the figure below)....

<span title='2021-12-13 17:23:12 +1100 AEDT'>December 13, 2021</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang

David_abel on the Expressivity of Markov Reward 2021

[TOC] Title: On the Expressivity of Markov Reward Author: David Abel et. al. Publish Year: NuerIPS 2021 Review Date: 6 Dec 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. The author found out that in the Markov Decision Process scenario, (i.e., we do not look at the history of the trajectory to provide rewards), some tasks cannot be realised perfectly by reward functions....

<span title='2021-12-05 12:02:23 +1100 AEDT'>December 5, 2021</span>&nbsp;·&nbsp;5 min&nbsp;·&nbsp;Sukai Huang

Rishabh_agarwal Deep Reinforcement Learning at the Edge of the Stats Precipice 2021

[TOC] Title: Deep Reinforcement Learning at the Edge of the Statistical Precipice Author: Rishabh Agarwal et. al. Publish Year: NeurIPS 2021 Review Date: 3 Dec 2021 Summary of paper This needs to be only 1-3 sentences, but it demonstrates that you understand the paper and, moreover, can summarize it more concisely than the author in his abstract. Most current published results on deep RL benchmarks uses point estimate of aggregate performance such as mean and median score across the task....

<span title='2021-12-03 19:50:10 +1100 AEDT'>December 3, 2021</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;Sukai Huang