Jacob_andreas Guiding Pretraining in Reinforcement Learning With Llms 2023

[TOC] Title: Guiding Pretraining in Reinforcement Learning With Large Language Models Author: Yuqing De, Jacob Andreas et. al. Publish Year: 13 Feb 2023 Review Date: Wed, Apr 5, 2023 url: https://arxiv.org/pdf/2302.06692.pdf Summary of paper Motivation intrinstically motivated exploration methods address sparse reward problem by rewarding agents for visiting novel states or transitions. Contribution we describe a method that uses background knowledge from text corpora to shape exploration. This method, call Exploring with LLMs, reward an agent for achieving goals suggested by a language model prompted with a description of agent’s current state....

<span title='2023-04-05 10:02:24 +0800 +0800'>April 5, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;298 words&nbsp;·&nbsp;Sukai Huang

Luke_zettlemoyer Scaling Expert Language Models With Unsupervised Domain Discovery 2023

[TOC] Title: Scaling Expert Language Models With Unsupervised Domain Discovery Author: Luke Zettlemoyer et. al. Publish Year: 24 Mar, 2023 Review Date: Mon, Apr 3, 2023 url: https://arxiv.org/pdf/2303.14177.pdf Summary of paper Contribution we introduce a simple but efficient method to asynchronously train large, sparse language models on arbitrary text corpora. Our method clusters a corpus into sets of related documents, trains a separate expert language model on each cluster, and combines them in a sparse ensemble for inference....

<span title='2023-04-03 15:25:01 +0800 +0800'>April 3, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;161 words&nbsp;·&nbsp;Sukai Huang

Xuanting_chen How Robust Is GPT 3.5 to Predecessors a Comprehensive Study on Language Understanding Tasks

[TOC] Title: How Robust Is GPT 3.5 to Predecessors a Comprehensive Study on Language Understanding Tasks Author: Xuanting Chen et. al. Publish Year: 2023 Review Date: Mon, Apr 3, 2023 url: https://arxiv.org/ftp/arxiv/papers/2303/2303.00293.pdf Summary of paper Motivation GPT3.5, their robustness, and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI Contribution Our study yielded the following findings by comparing GPT 3....

<span title='2023-04-03 15:00:57 +0800 +0800'>April 3, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;409 words&nbsp;·&nbsp;Sukai Huang

Anthony_liu a Picture Is Worth a Thousand Words Language Models Plan From Pixels 2023

[TOC] Title: A Picture Is Worth a Thousand Words Language Models Plan From Pixels Author: Anthony Liu et.al. Publish Year: 16 Mar 2023 Review Date: Mon, Apr 3, 2023 url: https://arxiv.org/pdf/2303.09031v1.pdf Summary of paper Motivation planning is a important capability of AI that perform long-horizon tasks in real-world environments. prior PLM based approaches for planning either assume observations are available in the form of text, reason about plans from the instruction alone, or incorporate information about the visual environment in limited ways....

<span title='2023-04-03 11:28:43 +0800 +0800'>April 3, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;359 words&nbsp;·&nbsp;Sukai Huang

Wenlong_huang Grounded Decoding Guiding Text Generation With Grounded Models for Robot Control 2023

[TOC] Title: Grounded Decoding Guiding Text Generation With Grounded Models for Robot Control Author: WenLong Huang et. al. Publish Year: 1 Mar, 2023 Review Date: Thu, Mar 30, 2023 url: https://arxiv.org/abs/2303.00855 Summary of paper Motivation Unfortunately, applying LLMs to settings with embodied agents, such as robots, is challenging due to their lack of experience with the physical world, inability to parse non-language observations, and ignorance of rewards or safety constraints that robots may require....

<span title='2023-03-30 23:45:18 +0800 +0800'>March 30, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;229 words&nbsp;·&nbsp;Sukai Huang

Mariana_learning Generative Models With Goal Conditioned Reinforcement Learning 2023

[TOC] Title: Learning Generative Models With Goal Conditioned Reinforcement Learning Author: Mariana Vargas Vieyra et. al. Publish Year: 26 Mar 2023 Review Date: Thu, Mar 30, 2023 url: https://arxiv.org/abs/2303.14811 Summary of paper Contribution we present a novel framework for learning generative models with goal-conditioned reinforcement learning we define two agents, a goal conditioned agent (GC-agent) and a supervised agent (S-agent) Given a user-input initial state, the GC-agent learns to reconstruct the training set....

<span title='2023-03-30 21:20:31 +0800 +0800'>March 30, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;325 words&nbsp;·&nbsp;Sukai Huang

Itsugun_cho Deep Rl With Hierarchical Action Exploration for Dialogue Generation 2023

[TOC] Title: Deep RL With Hierarchical Action Exploration for Dialogue Generation Author: Itsugun Cho et. al. Publish Year: 22 Mar 2023 Review Date: Thu, Mar 30, 2023 url: https://arxiv.org/pdf/2303.13465v1.pdf Summary of paper Motivation Approximate dynamic programming applied to dialogue generation involves policy improvement with action sampling. However, such a practice is inefficient for reinforcement learning because the eligible (high action value) responses are very sparse, and the greedy policy sustained by the random sampling is flabby....

<span title='2023-03-30 15:01:16 +0800 +0800'>March 30, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;358 words&nbsp;·&nbsp;Sukai Huang

Theodore_r_sumers How to Talk So Ai Will Learn 2022

[TOC] Title: How to talk so AI will learn: Instructions, descriptions, and autonomy Author: Theodore R. Sumers et. al. Publish Year: NeurIPS 2022 Review Date: Wed, Mar 15, 2023 url: https://arxiv.org/pdf/2206.07870.pdf Summary of paper Motivation yet today, we lack computational models explaining such language use Contribution To address this challenge, we formalise learning from language in a contextual bandit setting and ask how a human might communicate preferences over behaviours. (obtain intent (preference) from the presentation (behaviour)) we show that instructions are better in low-autonomy settings, but descriptions are better when the agent will need to act independently....

<span title='2023-03-15 21:09:32 +0800 +0800'>March 15, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;591 words&nbsp;·&nbsp;Sukai Huang

Cheng_chi Diffusion Policy Visuomotor Policy Learning via Action Diffusion 2023

[TOC] Title: Diffusion Policy Visuomotor Policy Learning via Action Diffusion Author: Cheng Chi et. al. Publish Year: 2023 Review Date: Thu, Mar 9, 2023 url: https://diffusion-policy.cs.columbia.edu/diffusion_policy_2023.pdf Summary of paper Contribution introducing a new form of robot visuomotor policy that generates behaviour via a “conditional denoising diffusion process” on robot action space Some key terms Explicit policy learning this is like imitation learning Implicit policy aiming to minimise the estimation of the energy function learning this is like a standard reinforcement learning diffusion policy...

<span title='2023-03-09 19:36:17 +1100 AEDT'>March 9, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;205 words&nbsp;·&nbsp;Sukai Huang

Alan_lindsay Framer Planning Models From Natural Language Action Descriptions 2017

[TOC] Title: Framer: Planning Models From Natural Language Action Descriptions Author: Alan Lindsay et. al. Publish Year: 2017 Review Date: Thu, Mar 9, 2023 url: https://core.ac.uk/download/pdf/322329049.pdf Summary of paper Motivation for modelling assisting and model generation tools, there is a underlying assumption that the user can formulate the problem using some formal language. this motivates us to generate planning domain models directly from NL descriptions. Some key terms approach we start from NL descriptions of actions and use NL analysis to construct structured representation, from which we construct formal representations of action sequences ?...

<span title='2023-03-09 19:28:47 +1100 AEDT'>March 9, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;482 words&nbsp;·&nbsp;Sukai Huang

Siddharth_karamcheti Language Driven Representation Learning for Robotics 2023

[TOC] Title: Language-Driven Representation Learning for Robotics Author: Siddharth Karamcheti et. al. Publish Year: 24 Feb 2023 Review Date: Fri, Mar 3, 2023 url: https://arxiv.org/pdf/2302.12766.pdf Summary of paper Motivation recent work in visual representation learning for robotics demonstrates the viability of learning from large video datasets of humans performing everyday tasks. leveraging methods such as masked autoencoding and contrastive learning, these representations exhibit strong transfer to policy learning for visuomotor control but robot learning encompasses a diverse set of problems beyond control including grasp affordance prediction, language-conditioned imitation learning, and intent scoring for human-robot collaboration amongst others....

<span title='2023-03-03 16:16:19 +1100 AEDT'>March 3, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;463 words&nbsp;·&nbsp;Sukai Huang

Tatsuki_kuribayashi Does Vision Accelerate Hierarchical Generalisation of Neural Language Learners 2023

[TOC] Title: Does Vision Accelerate Hierarchical Generalisation of Neural Language Learners Author: Tatsuki Kuribayashi Publish Year: 1 Feb 2023 Review Date: Fri, Mar 3, 2023 url: https://arxiv.org/pdf/2302.00667.pdf Summary of paper Motivation we want to know if the visual information improves hierarchical generalisaiton of the language model Contribution our results have exhibited that vision accelerated a proper linguistic generlisation in the simplified, artificial setting, but LMs struggled with the proper generalisation in the noisy, realistic setting....

<span title='2023-03-03 15:26:55 +1100 AEDT'>March 3, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;111 words&nbsp;·&nbsp;Sukai Huang

Jing_cheng_pang Natural Language Conditioned Reinforcement Learning With Inside Out Task Language Development and Translation 2023

[TOC] Title: Jing_cheng_pang Natural Language Conditioned Reinforcement Learning With Inside Out Task Language Development and Translation 2023 Author: Jing-Cheng Pang et. al. Publish Year: 18 Feb 2023 Review Date: Fri, Mar 3, 2023 url: https://arxiv.org/pdf/2302.09368.pdf Summary of paper Motivation previous approaches generally implemented language-conditioned RL by providing human instructions in natural language and training a following policy this is outside-in approach the policy needs to comprehend the NL and manage the task simultaneously....

<span title='2023-03-03 15:19:43 +1100 AEDT'>March 3, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;173 words&nbsp;·&nbsp;Sukai Huang

Suvaansh_bhambri Multi Level Compositional Reasoning for Interactive Instruction Following 2023

[TOC] Title: Multi-Level Compositional Reasoning for Interactive Instruction Following Author: Suvaansh Bhambri et. al. Publish Year: 2023 Review Date: Fri, Mar 3, 2023 url: https://ppolon.github.io/paper/aaai2023-alfred-mocha.pdf Summary of paper Motivation The task given to the agents are often composite thus are challenging as completing them require to reason about multiple subtasks. Contribution we propose to divide and conquer it by breaking the task into multiple subgoals and attend to them individually for better navigation and interaction....

<span title='2023-03-03 11:17:01 +1100 AEDT'>March 3, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;144 words&nbsp;·&nbsp;Sukai Huang

Tianjun_zhang the Wisdom of Hindsight Makes Language Models Better Instruction Followers 2023

[TOC] Title: The Wisdom of Hindsight Makes Language Models Better Instruction Followers Author: Tianjun Zhang et. al. Publish Year: 10 Feb 2023 Review Date: Thu, Mar 2, 2023 url: https://arxiv.org/pdf/2302.05206.pdf Summary of paper Motivation Reinforcement learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the pipeline for reward and value networks Contribution in this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner....

<span title='2023-03-02 19:06:35 +1100 AEDT'>March 2, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;427 words&nbsp;·&nbsp;Sukai Huang

Ying_shen Learning by Asking for Embodied Visual Navigation and Task Completion 2023

[TOC] Title: Learning by Asking for Embodied Visual Navigation and Task Completion Author: Ying Shen et. al. Publish Year: 9 Feb 2023 Review Date: Thu, Mar 2, 2023 url: https://arxiv.org/pdf/2302.04865.pdf Summary of paper Motivation despite recent progress on related vision-language benchmarks, most prior work has focused on building agents that follow instructions rather than endowing agents the ability to ask questions to actively resolve ambiguities arising naturally in embodied environments. Contribution we introduce an Embodied Learning by asking (ELBA) model that learns when to ask and what to ask for vision-dialog navigation and task completion....

<span title='2023-03-02 17:51:02 +1100 AEDT'>March 2, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;411 words&nbsp;·&nbsp;Sukai Huang

Ernest_davis Benchmarks for Automated Commonsense Reasoning a Survey 2023

[TOC] Title: Benchmarks for Automated Commonsense Reasoning a Survey Author: Ernest Davis Publish Year: 9 Feb 2023 Review Date: Thu, Mar 2, 2023 url: https://arxiv.org/pdf/2302.04752.pdf Summary of paper we mainly focus on the section where the author discusses about features of commonsense reasoning generally. Terms clarify what we mean by common sense what is exactly “commonsensical”? Claims about common sense that seem true to the author Commonsense knowledge is common. In talking to other person, we do not have to explain common sense reasoning or enumerate common sense facts....

<span title='2023-03-02 15:22:51 +1100 AEDT'>March 2, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;573 words&nbsp;·&nbsp;Sukai Huang

Alexander_nikulin Anti Exploration by Random Network Distillation 2023

[TOC] Title: Anti Exploration by Random Network Distillation Author: Alexander Nikulin et. al. Publish Year: 31 Jan 2023 Review Date: Wed, Mar 1, 2023 url: https://arxiv.org/pdf/2301.13616.pdf Summary of paper Motivation despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning ?? wait, why we want to penalizing out-of-distribution actions?...

<span title='2023-03-01 22:14:11 +1100 AEDT'>March 1, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;359 words&nbsp;·&nbsp;Sukai Huang

Edoardo_cetin Learning Pessimism for Reinforcement Learning 2023

[TOC] Title: Learning Pessimism for Reinforcement Learning Author: Edoardo Cetin et. al. Publish Year: 2023 Review Date: Wed, Mar 1, 2023 url: https://kclpure.kcl.ac.uk/portal/files/196848783/10977.CetinE.pdf Summary of paper Motivation Off-policy deep RL algorithms commonly compensate for overestimation bias during temporal difference learning by utilizing pessimistic estimates of the expected target returns Contribution we propose Generalised Pessimism Learning (GPL), a strategy employing a novel learnable penalty to enact such pessimism. In particular we propose to learn this penalty alongside the critic with dual TD-learning, a new procedure to estimate and minimise the magnitude of the target returns bias with trivial computational cost....

<span title='2023-03-01 21:02:25 +1100 AEDT'>March 1, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;222 words&nbsp;·&nbsp;Sukai Huang

Timo_schick Toolformer Language Models Can Teach Themselves to Use Tools 2023

[TOC] Title: Toolformer: Language Models Can Teach Themselves to Use Tools 2023 Author: Timo Schick et. al. META AI research Publish Year: 9 Feb 2023 Review Date: Wed, Mar 1, 2023 url: https://arxiv.org/pdf/2302.04761.pdf Summary of paper Motivation LMs exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also struggle with basic functionality, such as arithmetic or factual lookup. Contribution In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds....

<span title='2023-03-01 19:57:49 +1100 AEDT'>March 1, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;486 words&nbsp;·&nbsp;Sukai Huang

Almog_gueta Knowledge Is a Region in Weight Space for Fine Tuned Language Model 2023

[TOC] Title: Knowledge Is a Region in Weight Space for Fine Tuned Language Model Author: Almog Gueta et. al. Publish Year: 12 Feb 2023 Review Date: Wed, Mar 1, 2023 url: https://arxiv.org/pdf/2302.04863.pdf Summary of paper Motivation relatively little is known a bout the relationships between different models, especially those trained or tested on different datasets. Contribution we demonstrate that fine-tuned models that were optimized for high performance, reside in well-defined regions in weight space, and vice versa language models that have been fine-tuned on the same dataset form a tight cluster in the same weight space and that models fine-tuned on different datasets from the same underlying task form a looser cluster....

<span title='2023-03-01 12:45:54 +1100 AEDT'>March 1, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;548 words&nbsp;·&nbsp;Sukai Huang

Xiwen_liang Contrastive Instruction Trajectory Learning for Vision Language Navigation 2022

[TOC] Title: Contrastive Instruction Trajectory Learning for Vision Language Navigation Author: Xiwen Liang et. al. Publish Year: AAAI 2022 Review Date: Fri, Feb 10, 2023 url: https://arxiv.org/abs/2112.04138 Summary of paper Motivation previous works learn to navigate step-by-step following an instruction. However, these works may fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions. These problems hinder agents from learning distinctive vision-and-language representations, Contribution we propose a coarse-grained contrastive learning objective to enhance vision-and-language representations by contrasting semantics of full trajectory observations and instructions respectively; a fine-grained contrastive learning objective to perceive instructions by leveraging the temporal information of the sub-instructions....

<span title='2023-02-10 02:51:23 +1100 AEDT'>February 10, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;360 words&nbsp;·&nbsp;Sukai Huang

Jacob_andreas Lammp Language Models as Probabilistic Priors for Perception and Action 2023

[TOC] Title: LAMMP Language Models as Probabilistic Priors for Perception and Action 2023 Author: Belinda Z. Li, Jacob Andreas et. al. Publish Year: 3 Feb 2023 Review Date: Fri, Feb 10, 2023 url: https://arxiv.org/pdf/2302.02801.pdf Summary of paper Motivation Language models trained on large text corpora encode rich distributional information about real-world environments and action sequences. this information plays a crucial role Contribution we describe how to leverage language models for non-linguistic perception and control tasks Our approach casts labelling and decision-making as inference in probabilistic graphical models in which language models parameterize prior distributions over labels, decisions and parameters, making it possible to integrate uncertain observations and incomplete background knowledge in a principled way....

<span title='2023-02-10 00:46:15 +1100 AEDT'>February 10, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;267 words&nbsp;·&nbsp;Sukai Huang

Zhuosheng_zhang Multimodal Chain of Thought Reasoning in Language Models 2023

[TOC] Title: Multimodal Chain of Thought Reasoning in Language Models Author: Zhuosheng Zhang et. al. Publish Year: 2023 Review Date: Wed, Feb 8, 2023 url: https://arxiv.org/pdf/2302.00923.pdf Summary of paper Motivation LLMs have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. to elicit CoT reasoning in multimodality, a possible solution is to fine-tune small language models by fusing the vision and language features to perform CoT reasoning....

<span title='2023-02-08 22:23:45 +1100 AEDT'>February 8, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;548 words&nbsp;·&nbsp;Sukai Huang

Siyuan_wang Unifying Structure Reasoning and Language Model Pre Training for Complex Reasoning 2023

[TOC] Title: Unifying Structure Reasoning and Language Model Pre Training for Complex Reasoning Author: Siyuan Wang et. al. Publish Year: 21 Jan 2023 Review Date: Wed, Feb 8, 2023 url: https://arxiv.org/pdf/2301.08913.pdf Summary of paper Motivation language models still suffer from a heterogeneous information alignment problem and a noisy knowledge injection problem. for complex reasoning, the context contains rich knowledge that typically exists in complex and sparse form. Contribution we propose to unify structure reasoning and language model pre-training identifies four types of elementary knowledge structures from contexts to construct structured queries utilise box embedding method to conduct explicit structure reasoning along query during language modeling Some key terms What is the problem...

<span title='2023-02-08 22:17:31 +1100 AEDT'>February 8, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;281 words&nbsp;·&nbsp;Sukai Huang