Angela_fan Augmenting Transformer With Knn Composite Memory for Dialog 2021

[TOC] Title: Augmenting Transformers with KNN-based composite memory for dialog Author: Angela Fan et. al. Publish Year: 2021 Review Date: Apr 2022 Summary of paper Motivation The author proposed augmenting generative Transformer neural network with KNN based Information Fetching module Each KIF module learns a read operation to access fix external knowledge (e.g., WIKI) The author demonstrated the effectiveness of this approach by identifying relevant knowledge required for knowledgeable but engaging dialog from Wikipedia, images and human-written dialog utterances. ...

April 21, 2022 · 3 min · Sukai Huang

Hao_hu Generalisable Episodic Memory for Drl 2021

[TOC] Title: Generalisable episodic memory for Deep Reinforcement Learning Author: Hao Hu et. al. Publish Year: Jun 2021 Review Date: April 2022 Summary of paper Motivation The author proposed Generalisable Episodic Memory (GEM), which effectively organises the state-action values of episodic memory in a generalisable manner and supports implicit planning on memorised trajectories. so compared to traditional memory table, GEM learns a virtual memory table memorized by deep neural networks to aggregate similar state-action pairs that essentially have the same nature. ...

April 7, 2022 · 2 min · Sukai Huang

Ilya_kostrikov Offline Rl With Implicit Q Learning 2021

[TOC] Title: Offline Reinforcement Learning with Implicit Q-learning Author:Ilya Kostrikov et. al. Publish Year: 2021 Review Date: Mar 2022 Summary of paper Motivation conflict in offline reinforcement learning offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behaviour policy (old policy) that collected the dataset while at the same time minimizing the deviation from the behaviour policy so as to avoid errors due to distributional shift (e.g., obtain out of distribution actions) -> the challenge is how to constrain those unseen actions to be in-distribution. (meaning there is no explicit Q-function for actions, and thus the issue of unseen action is gone) all the previous solutions like 1. limit how far the new policy deviates from the behaviour policy and 2. assign low value to out of distribution actions impose a trade-off between how much the policy improve and how vulnerable it is to misestimation due to distributional shift. ...

March 22, 2022 · 4 min · Sukai Huang

Qinqing_zheng Online Decision Transformer 2022

[TOC] Title: Online Decision Transformer Author: Qinqing Zheng Publish Year: Feb 2022 Review Date: Mar 2022 Summary of paper Motivation the author proposed online Decision transformer (ODT), an RL algorithm based on sequence modelling that blends offline pretraining with online fine-tuning in a unified framework. ODT builds on the decision transformer architecture previously introduced for offline RL quantify exploration compared to DT, they shifted from deterministic to stochastic policies for defining exploration objectives during the online phase. They quantify exploration via the entropy of the policy similar to max-ent RL frameworks. ...

March 21, 2022 · 4 min · Sukai Huang

Sebastian_borgeaud Improving Language Models by Retrieving From Trillions of Tokens 2022

[TOC] Title: Improving language models by retrieving from trillions of tokens Author: Sebastian Borgeaud et. al. Publish Year: Feb 2022 Review Date: Mar 2022 Summary of paper Motivation in order to decrease the size of language model, this work suggested retrieval from a large text database as a complementary path to scaling language models. they equip models with the ability to directly access a large dataset to perform prediction – a semi-parametric approach. ...

March 21, 2022 · 2 min · Sukai Huang

Machel_reid Can Wikipedia Help Offline Rl 2022

[TOC] Title: Can Wikipedia Help Offline Reinforcement Learning Author: Machel Reid et. al. Publish Year: Mar 2022 Review Date: Mar 2022 Summary of paper Motivation Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Moreover, when the model is trained from scratch, it suffers from slow convergence speeds In this paper, they look to take advantage of this formulation of reinforcement learning as sequence modelling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when fine tuned on offline RL tasks (control, games). ...

March 16, 2022 · 2 min · Sukai Huang

Stephen_cresswell Generalised Domain Model Acquisition From Action Traces 2013

[TOC] Title: Generalised Domain Model Acquisition from Action Traces (LOCM2) Author: Stephen Cresswell et. al. Publish Year: 2013 Review Date: Mar 2022 Summary of paper Motivation One approach to the problem of formulating domain models for planning is to learn the models from example action sequences. This work extended LOCM by allowing multiple parameterised state machine to represent a single object. In other words, it is possible to automatically infer the underlying transition system from sample action sequences of the domain. Using such an approach removes the necessity for the domain expert to also be an expert at modelling transition systems. ...

March 15, 2022 · 2 min · Sukai Huang

Wenfeng_feng Extracting Action Sequences From Texts by Rl

[TOC] Title: Extracting Action Sequences from Texts Based on Deep Reinforcement Learning Author: Wenfeng Feng et. al. Publish Year: Mar 2018 Review Date: Mar 2022 Summary of paper Motivation the author want to build a model that learns to directly extract action sequences without external tools like POS tagging and dependency parsing results… Annotation dataset structure example Model they exploit the framework to learn two models to predict action names and arguments respectively. ...

March 15, 2022 · 1 min · Sukai Huang

Shivam_miglani Nltopddl Learning From Nlp Manuals 2020

[TOC] Title: NLtoPDDL: One-Shot Learning of PDDL Models from Natural Language Process Manuals Author: Shivam Miglani et. al. Publish Year: 2020 Review Date: Mar 2022 Summary of paper Motivation pipeline Pipeline architecture Phase 1 we have a DQN that learns to extract words that represent action name, action arguments, and the sequence of actions present in annotated NL process manuals. (why only action name, do we need to extract other information???) Again, why this is called DQN RL? is it just normal supervised learning… (Check EASDRL paper to understand Phase 1) ...

March 14, 2022 · 2 min · Sukai Huang

Giuseppe_de_giacomo Foundations for Retraining Bolts Rl With Ltl 2019

[TOC] Title: Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specification Author: Giuseppe De Giacomo et. al. Publish Year: 2019 Review Date: Mar 2022 Summary of paper The author investigated the concept of “restraining bolt” that can control the behaviour of learning agents. Essentially, the way to control a RL agent is that the bolt provides additional rewards to the agent Although this method is essentially the same as reward shaping (providing additional rewards to the agent), the contribution of this paper is ...

March 4, 2022 · 2 min · Sukai Huang

Joseph_kim Collaborative Planning With Encoding of High Level Strategies 2017

please modify the following [TOC] Title: Collaborative Planning with Encoding of Users’ High-level Strategies Author: Joseph Kim et. al. Publish Year: 2017 Review Date: Mar 2022 Summary of paper Motivation Automatic planning is computationally expensive. Greedy search heuristics often yield low-quality plans that can result in wasted resources; also, even in the event that an adequate plan is generated, users may have difficulty interpreting the reason why the plan performs well and trusting it. ...

March 4, 2022 · 2 min · Sukai Huang

Mikayel_samvelyan Minihack the Planet a Sandbox for Open Ended Rl Research 2021

[TOC] Title: MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research Author: Mikayel Samvelyan et. al. Publish Year: Nov 2021 Review Date: Mar 2022 Summary of paper They presented MiniHack, an easy-to-use framework for creating rich and varied RL environments, as well as a suite of tasks developed using this framework. Built upon NLE and the des-file format, MiniHack enables the use of rich entities and dynamics from the game of NetHack to create a large variety of RL environments for targeted experimentation, while also allowing painless scaling-up of the difficulty of existing environments. MiniHack’s environments are procedurally generated by default, ensuring the evaluation of systematic generalization of RL agents. ...

March 4, 2022 · 3 min · Sukai Huang

Richard_shin Constrained Language Models Yield Few Shot Semantic Parsers 2021

[TOC] Title: Constrained Language models yield few-shot semantic parsers Author: Richard Shin et. al. Publish Year: Nov 2021 Review Date: Mar 2022 Summary of paper Motivation The author wanted to explore the use of large pretrained language models as few-shot semantic parsers However, language models are trained to generate natural language. To bridge the gap, they used language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation. (using synchronous context-free grammar SCFG) ...

March 2, 2022 · 1 min · Sukai Huang

Heinrich_kuttler the Nethack Learning Environment 2020

[TOC] Title: The NetHack Learning Environment Author: Heinrich Kuttler et. al. Publish Year: Dec 2020 Review Date: Mar 2022 Summary of paper The NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. ...

March 2, 2022 · 3 min · Sukai Huang

Pashootan_vaezipoor Ltl2action Generalising Ltl Instructions for Multi Task Rl 2021

please modify the following [TOC] Title: LTL2Action: Generalizing LTL Instructions for Multi-Task RL Author: Pashootan Vaezipoor et. al. Publish Year: 2021 Review Date: March 2022 Summary of paper Motivation they addressed the problem of teaching a deep reinforcement learning agent to follow instructions in multi-task environments. Instructions are expressed in a well-known formal language – linear temporal logic (LTL) Limitation of the vanilla MDP temporal constraints cannot be expressed as rewards in MDP setting and thus modular policy and other stuffs are not able to obtain maximum rewards. ...

March 1, 2022 · 3 min · Sukai Huang

Roma_patel Learning to Ground Language Temporal Logical Form 2019

[TOC] Title: Learning to Ground Language to Temporal Logical Form Author: Roma Patel et. al. Publish Year: 2019 Review Date: Feb 2022 Summary of paper Motivation natural language commands often exhibits sequential (temporal) constraints e.g., “go through the kitchen and then into the living room”. But this constraints cannot be expressed in the reward of Markov Decision Process setting. (see this paper) Therefore, they proposed to ground language to Linear Temporal logic (LTL) and after that continue to map from LTL expressions to action sequences. ...

February 28, 2022 · 2 min · Sukai Huang

Thang_m_pham Out of Order How Important Is the Sequential Order of Words in a Sentence in Natural Language Understanding Tasks 2021

[TOC] Title: Out of Order: How Important Is The Sequential Order of Words in a Sentence in Natural Language Understanding Tasks? Author: Thang M. Pham Publish Year: Jul 2021 Review Date: Feb 2022 Summary of paper The author found out that BERT-based models trained on GLUE have low sensitivity to word orders. The research questions are the following Do BERT-based models trained on GLUE care about the order of words in a sentence? ANS: NO, except one task named CoLA, which is to detecting grammatically incorrect sentences. Surprisingly, for the rest of the 5 out of 6 binary-classification tasks (i.e. except CoLA), between75% and 90% of the originally correct predictions remain constant after 1-grams are randomly re-ordered Are SOTA BERT-based models using word order information when solving NLU tasks? If not, what cues do they rely on? ANS: they heavily rely on the word itself rather than the ordering. The results showed that if the top - 1 most important word measured by LIME has a positive meaning, then there is 100% probability that the sentence’s label is “positive” Results ...

February 28, 2022 · 2 min · Sukai Huang

Anton_belyy Guided K Best Selection for Semantic Parsing Annotation 2021

[TOC] Title: Guided K-best Selection for Semantic Parsing Annotation Author: Anton Belyy et. al. Publish Year: 2021 Review Date: Feb 2022 Summary of paper Motivation They wanted to tackle the challenge of efficient data collection (data annotation) for the conversational semantic parsing task. In the presence of little available training data, they proposed human-in-the-loop interfaces for guided K-best selection, using a prototype model trained on limited data. Result Their user studies showed that the keyword searching function combined with a keyword suggestion method strikes the balance between annotation accuracy and speed ...

February 23, 2022 · 3 min · Sukai Huang

S_teufel Argumentative Zoning 2000

[TOC] Title: Argumentative Zoning Author: Simone Teufel Publish Year: 2000 Review Date: Feb 2022 https://www.cl.cam.ac.uk/~sht25/az.html Summary Abstract We present a new type of analysis for scientific text which we call Argumentative Zoning. We demonstrate that this type of text analysis can be used for generating user-tailored and task-tailored summaries and for performing more informative citation analyses. We also demonstrate that our type of analysis can be applied to unrestricted text, both automatically and by humans. The corpus we use for the analysis (80 conference papers in computational linguistics) is a difficult test bed; it shows great variation with respect to subdomain, writing style, register and linguistic expression. We present reliability studies which we performed on this corpus and for which we use two unrelated trained annotators. ...

February 16, 2022 · 2 min · Sukai Huang

Jacob_andreas Compositionality as Lexical Symmetry 2022

[TOC] Title: Compositionality as Lexical Symmetry Author: Ekin Akyurek; Jacob Andreas Publish Year: Jan 2022 Review Date: Feb 2022 Summary of paper Motivation Standard deep network models lack the inductive bias needed to generalize compositionally in tasks like semantic parsing, translation, and question answering. So, a large body of work in NLP seeks to overcome this limitation with new model architectures that enforce a compositional process of sentence interpretation. Goal ...

February 8, 2022 · 2 min · Sukai Huang

Tao_lei When Attention Meets Fast Recurrence Training Language Models With Reduced Compute 2021

[TOC] Title: When Attention Meets Fast Recurrence: Training Language Models with Reduce Compute Author: Tao Lei Publish Year: Sep 2021 Review Date: Jan 2022 Summary of paper As the author mentioned, the inspiration of SRU++ comes from two lines of research: paralleization / speed problem of Original RNN leveraging recurrence in conjunction with self-attention Structure of SRU++ New discovery :little attention is needed given recurrence. Similar to the observation of Merity (2019), they found using a couple of attention layers sufficient to obtain SOTA results. ...

January 14, 2022 · 1 min · Sukai Huang

Alex_nichol Glide Towards Photorealistic Image Generation and Editing With Text Guided Diffusion Models 2021

[TOC] Title: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Author: Alex Nichol et. al. Publish Year: Dec 2021 Review Date: Jan 2022 Summary of paper In author’s previous work, the diffusion model can achieve photorealism in the class-conditional setting by augmenting with classifier guidance, a technique which allows diffusion models to condition on a classifier’s labels. The classifier is first trained on noised images, and during the diffusion sampling process, gradients from the classifier are used to guide the output sample towards the label. classifier details ...

January 12, 2022 · 2 min · Sukai Huang

Junyang_lin M6 a Chinese Multimodal Pretrainer 2021

[TOC] Title: M6: A Chinese Multimodal Pretrainer Author: Junyang Lin et. al. Publish Year: May 2021 Review Date: Jan 2022 Summary of paper This paper re-emphasises that large model trained on big data have extremely large capacity and it can outperform the SOTA in downstream tasks especially in the zero-shot setting. So, the author trained a big multi-modal model Also, they proposed a innovative way to tackle downstream tasks. they use masks to block cross attention between tokens so as to fit different types of downstream task Key idea: mask tokens during cross attention so as to solve certain tasks Overview ...

January 12, 2022 · 1 min · Sukai Huang

Tianshi_cao Babyai Plus Plus Towards Grounded Language Learning Beyond Memorization 2020

[TOC] Title: BABYAI++: Towards Grounded-Language Learning Beyond Memorization Author: Tianshi Cao et. al. Publish Year: 2020 ICLR Review Date: Jan 2022 Summary of paper The paper introduced a new RL environment BabyAI++ that can investigate whether RL agents can extract knowledge from descriptive text and eventually increase generalisation performance. BabyAI++ environment example the descriptive text describe the feature of the object. notice that the feature of object can easily change as we change the descriptive text. Model ...

January 3, 2022 · 1 min · Sukai Huang

Federico_bianchi Language in a Search Box Grounding Language Learning in Real World Human Machine Interaction 2021

[TOC] Title: Language in a (Search) Box: Grounding Language Learning in Real-World Human-Machine Interaction Author: Federico Bianchi Publish Year: 2021 Review Date: Jan 2022 Summary of paper the author investigated grounded language learning through the natural interaction between users and the shopping website search engine. How they do it convert the shopping object dataset into a Latent Grounded Domain related products end up closer in the embedding space train the mapping model (mapping from text query to a portion of product space) based on the user click behaviour (In the training dataset, the users queries about “Nike” and the they would click relevant Nike Product) ...

January 3, 2022 · 1 min · Sukai Huang