Danijar_hafner Mastering Diverse Domains Through World Models 2023

[TOC] Title: Mastering Diverse Domains Through World Models Author: Danijar Hafner et. al. Publish Year: 10 Jan 2023 Review Date: Tue, Feb 7, 2023 url: https://www.youtube.com/watch?v=vfpZu0R1s1Y Summary of paper Motivation general intelligence requires solving tasks across many domains. Current reinforcement learning algorithms carry this potential but held back by the resources and knowledge required tune them for new task. Contribution we present DreamerV3, a general and scalable algorithm based on world models that outperforms previous approaches across a wide range of domains with fixed hyperparameters....

<span title='2023-02-07 18:18:37 +1100 AEDT'>February 7, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;291 words&nbsp;·&nbsp;Sukai Huang

Yuanhan_zhang What Makes Good Examples for Visual in Context Learning 2023

[TOC] Title: What Makes Good Examples for Visual in Context Learning Author: Yuan Zhang et. al. Publish Year: 1 Feb 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.13670.pdf Summary of paper Motivation in this paper, the main focus is on an emergent ability in large vision models, known. as in-context learning this concept has been well-known in natural language processing but has only been studied very recently for large vision models....

<span title='2023-02-06 22:38:35 +1100 AEDT'>February 6, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;427 words&nbsp;·&nbsp;Sukai Huang

Jing_yu_koh Grounding Language Models to Images for Multimodal Generation 2023

[TOC] Title: Grounding Language Models to Images for Multimodal Generation Author: Jing Yu Koh et. al. Publish Year: 31 Jan 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.13823.pdf Summary of paper Motivation we propose an efficient method to ground pre-trained text-only language models to the visual domain How we keep the language model frozen, and finetune input and output linear layers to enable cross-modality interactions. This allows our model to process arbitrarily interleaved Contribution our approach works with any off-the-shelf language model and paves the way towards an effective, general solution for leveraging pre-trained language models in visually grounded settings....

<span title='2023-02-06 22:37:53 +1100 AEDT'>February 6, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;239 words&nbsp;·&nbsp;Sukai Huang

Zhenfang_chen See Think Confirm Interactive Prompting Between Vision and Language Models for Knowledge Based Visual Reasoning 2023

[TOC] Title: See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge Based Visual Reasoning Author: Zhenfang Chen et. al. Publish Year: 12 Jan 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.05226.pdf Summary of paper Motivation Solving the knowledge-based visual reasoning tasks remains challenging, which requires a model to comprehensively understand image content, connect external world knowledge, and perform step-by-step reasoning to answer the questions correctly. Contribution We propose a novel framework named Interactive Prompting Visual Reasoner (IPVR) for few-shot knowledge based visual reasoning....

<span title='2023-02-06 22:36:41 +1100 AEDT'>February 6, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;405 words&nbsp;·&nbsp;Sukai Huang

Xiaotian_liu a Planning Based Neural Symbolic Approach for Embodied Instruction Following 2022

[TOC] Title: A Planning Based Neural Symbolic Approach for Embodied Instruction Following Author: Xiaotian Liu et. al. Publish Year: 2022 Review Date: Thu, Feb 2, 2023 url: https://embodied-ai.org/papers/2022/15.pdf Summary of paper Motivation end-to-end deep learning methods struggle at these tasks due to long-horizon and sparse rewards. Contribution Our main innovation relies on combining DL models for perception and NLP with a new egocentric planner based on successive planning problems formulated using the PDDL syntax, both for exploration and task accomplishment....

<span title='2023-02-02 13:28:19 +1100 AEDT'>February 2, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;226 words&nbsp;·&nbsp;Sukai Huang

So_yeon_min Film Following Instructions in Language With Modular Methods 2022

[TOC] Title: FILM: Following Instructions in Language With Modular Methods Author: So Yeon Min et. al. Publish Year: 16 Mar 2022 Review Date: Wed, Feb 1, 2023 url: https://arxiv.org/pdf/2110.07342.pdf Summary of paper Motivation current approaches assume that neural states will integrate multimodal semantics to perform state tracking, building spatial memory, exploration, and long-term planning. in contrast, we propose a modular method with structured representation that build a semantic map of scene and perform exploration with a semantic search policy, to achieve natural language goal....

<span title='2023-02-01 18:32:24 +1100 AEDT'>February 1, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;430 words&nbsp;·&nbsp;Sukai Huang

Yuki_inoue Prompter Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following 2022

[TOC] Title: Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following Author: Yuki Inoue et. al. Publish Year: 7 Nov 2022 Review Date: Wed, Feb 1, 2023 url: https://arxiv.org/pdf/2211.03267.pdf Summary of paper Motivation we propose FILM++ which extends the existing work FILM with modifications that do not require extra data. furthermore, we propose Prompter, which replace FILM++’s semantic search module with language model prompting. no training is needed for our prompting based implementation while achieving better or least comparable performance....

<span title='2023-02-01 17:22:35 +1100 AEDT'>February 1, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;526 words&nbsp;·&nbsp;Sukai Huang

Kyle_mahowald Dissociating Language and Thought in Large Language Models a Cognitive Perspective 2023

[TOC] Title: Dissociating Language and Thought in Large Language Models a Cognitive Perspective Author: Kyle Mahowald et. al. Publish Year: 16 Jan 2023 Review Date: Tue, Jan 31, 2023 url: https://arxiv.org/pdf/2301.06627.pdf Summary of paper Motivation the author tried to challenge the “good at language $\implies$ good at thought” fallacy. the second fallacy is “bad at thought $\implies$ bad at language” Contribution the author argued that LLMs have promise as scientific models of one piece of the human cognitive toolbox – formal language processing – but fall short of modelling human thought....

<span title='2023-01-31 18:47:45 +1100 AEDT'>January 31, 2023</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;776 words&nbsp;·&nbsp;Sukai Huang

Michael_janner Planning With Diffusion for Flexible Behaviour Synthesis 2022

[TOC] Title: Planning With Diffusion for Flexible Behaviour Synthesis Author: Michael Janner et. al. Publish Year: 21 Dec 2022 Review Date: Mon, Jan 30, 2023 Summary of paper Motivation use the diffusion model to learn the dynamics tight coupling of the modelling and planning our goal is to break this abstraction barrier by designing a model and planning algorithm that are trained alongside one another, resulting in a non-autoregressive trajectory-level model for which sampling and planning are nearly identical....

<span title='2023-01-30 13:43:20 +1100 AEDT'>January 30, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;317 words&nbsp;·&nbsp;Sukai Huang

Shailaja_keyur_sampat Reasoning About Actions Over Visual and Linguistic Modalities a Survey 2022

[TOC] Title: Shailaja_keyur_sampat Reasoning About Actions Over Visual and Linguistic Modalities a Survey 2022 Author: Publish Year: Review Date: Fri, Jan 20, 2023 Summary of paper Motivation reasoning about actions & changes has been widely studies in the knowledge representation community, it has recently piqued the interest of NLP and computer vision researchers. Contribution Some key terms Six most frequent types of commonsense knowledge tasks that involve language-based reasoning about actions...

<span title='2023-01-20 13:59:00 +1100 AEDT'>January 20, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;524 words&nbsp;·&nbsp;Sukai Huang

Xin_wang Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019

[TOC] Title: Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019 Author: Xin Wang et. al. Publish Year: Review Date: Wed, Jan 18, 2023 Summary of paper Motivation Visual Language Navigation (VLN) presents some unique challenges first, reasoning over images and natural language instructions can be difficult. secondly, except for strictly following expert demonstrations, the feedback is rather coarse, since the “Success” feedback is provided only when the agent reaches a target position (sparse reward) A good “instruction following” trajectory may ended up just stop before you reaching the goal state and then receive zero rewards....

<span title='2023-01-18 09:48:14 +1100 AEDT'>January 18, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;195 words&nbsp;·&nbsp;Sukai Huang

Alekh_agarwal PC-PG Policy Cover Directed Exploration for Provable Policy Gradient Learning 2020

[TOC] Title: PC-PG Policy Cover Directed Exploration for Provable Policy Gradient Learning Author: Alekh Agarwal et. al. Publish Year: Review Date: Wed, Dec 28, 2022 Summary of paper Motivation The primary drawback of direct policy gradient methods is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approach and Q-learning directly handle exploration through the use of optimism. Contribution Policy Cover-Policy Gradient algorithm (PC-PG), a direct, model-free, policy optimisation approach which addresses exploration through the use of a learned ensemble of policies, the latter provides a policy cover over the state space....

<span title='2022-12-28 14:39:25 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;271 words&nbsp;·&nbsp;Sukai Huang

Alekh_agarwal on the Theory of Policy Gradient Methods Optimality Approximation and Distribution Shift 2020

[TOC] Title: On the Theory of Policy Gradient Methods Optimality Approximation and Distribution Shift 2020 Author: Alekh Agarwal et. al. Publish Year: 14 Oct 2020 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution and how they cope with approximation error due to using a restricted class of parametric policies....

<span title='2022-12-28 14:36:20 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;557 words&nbsp;·&nbsp;Sukai Huang

Chloe_ching_yun_hsu Revisiting Design Choices in Proximal Policy Optimisation 2020

[TOC] Title: Revisiting Design Choices in Proximal Policy Optimisation Author: Chloe Ching-Yun Hsu et. al. Publish Year: 23 Sep 2020 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation Contribution on discrete action space with sparse high rewards, standard PPO often gets stuck at suboptimal actions. Why analyze the reason fort these failure modes and explain why they are not exposed by standard benchmarks In summary, our study suggests that Beta policy parameterization and KL-regularized objectives should be reconsidered for PPO, especially when alternatives improves PPO in all settings....

<span title='2022-12-28 14:32:15 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;467 words&nbsp;·&nbsp;Sukai Huang

James_queeney Generalized Proximal Policy Optimisation With Sample Reuse 2021

[TOC] Title: Generalized Proximal Policy Optimisation With Sample Reuse 2021 Author: James Queeney et. al. Publish Year: 29 Oct 2021 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. Contribution in this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms....

<span title='2022-12-28 14:00:32 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;5 min&nbsp;·&nbsp;1033 words&nbsp;·&nbsp;Sukai Huang

Lun_wang Backdoorl Backdoor Attack Against Competitive Reinforcement Learning 2021

[TOC] Title: BackdooRL Backdoor Attack Against Competitive Reinforcement Learning 2021 Author: Lun Wang et. al Publish Year: 12 Dec 2021 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation in this paper, we propose BACKDOORL, a backdoor attack targeted at two player competitive reinforcement learning systems. first the adversary agent has to lead the victim to take a series of wrong actions instead of only one to prevent it from winning....

<span title='2022-12-28 03:57:59 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;202 words&nbsp;·&nbsp;Sukai Huang

Sandy_huang Adversarial Attacks on Neural Network Policies 2017

[TOC] Title: Adversarial Attacks on Neural Network Policies Author: Sandy Huang et. al. Publish Year: 8 Feb 2017 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation in this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies. Contribution we characterise the degree of vulnerability across tasks and training algorithm, for a subclass of adversarial example attacks in white-box and black-box settings....

<span title='2022-12-28 00:08:22 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;346 words&nbsp;·&nbsp;Sukai Huang

Yinglun_xu Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning 2022

[TOC] Title: Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning Author: Yinglun Xu et. al. Publish Year: 30 May 2022 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation we study data poisoning attacks on online deep reinforcement learning (DRL) where the attacker is oblivious to the learning algorithm used by the agent and does not necessarily have full knowledge of the environment. we instantiate our framework to construct several attacks which only corrupts the rewards for a small fraction of the total training timesteps and make the agent learn a low performing policy Contribution result show that the reward attack efficiently poison agent learning with a variety of SOTA DRL algorithm such as DQN, PPO our attack can work on model-free DRL algorithm for all popular learning paradigms, and only assume the learning algorithm to be efficient....

<span title='2022-12-27 23:14:19 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;302 words&nbsp;·&nbsp;Sukai Huang

Young_wu Reward Poisoning Attacks on Offline Multi Agent Reinforcement Learning 2022

[TOC] Title: Reward Poisoning Attacks on Offline Multi Agent Reinforcement Learning Author: Young Wu et. al. Publish Year: 1 Dec 2022 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Contribution unlike attacks on single-agent RL, we show that the attacker can install the target poilcy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks....

<span title='2022-12-27 22:50:14 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;146 words&nbsp;·&nbsp;Sukai Huang

Xuezhou_zhang Robust Policy Gradient Against Strong Data Corruption 2021

[TOC] Title: Robust Policy Gradient Against Strong Data Corruption Author: Xuezhou Zhang et. al. Publish Year: 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Abstract Contribution the author utilised a SVD-denoising technique to identify and remove the possible reward perturbations this approach gives a robust RL algorithm Limitation This approach only solve the attack perturbation that is not consistent. (i.e. not stealthy) Some key terms Policy gradient methods...

<span title='2022-12-27 20:35:10 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;317 words&nbsp;·&nbsp;Sukai Huang

Kiarash_banihashem Defense Against Reward Poisoning Attacks in Reinforcement Learning 2021

[TOC] Title: Defense Against Reward Poisoning Attacks in Reinforcement Learning Author: Kiarash Banihashem et. al. Publish Year: 20 Jun 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r.t. the true unpoisoned rewards while computing their policies under the poisoned rewards. Contribution we formalise this reasoning and characterize the utility of our novel framework for designing defense policies....

<span title='2022-12-27 18:27:17 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;303 words&nbsp;·&nbsp;Sukai Huang

Amin_rakhsha Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments 2021

[TOC] Title: Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments Author: Amin Rakhsha et. al. Publish Year: 16 Feb 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Our attack makes minimum assumptions on the prior knowledge of the environment or the learner’s learning algorithm. most of the prior work makes strong assumptions on the knowledge of adversary – it often assumed that the adversary has full knowledge of the environment or the agent’s learning algorithm or both....

<span title='2022-12-27 15:50:22 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;233 words&nbsp;·&nbsp;Sukai Huang

Xuezhou_zhang Adaptive Reward Poisoning Attacks Against Reinforcement Learning 2020

[TOC] Title: Adaptive Reward Poisoning Attacks Against Reinforcement Learning Author: Xuezhou Zhang et. al. Publish Year: 22 Jun, 2020 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$ whereas non-adaptive attacks require exponential steps. Contribution we provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe....

<span title='2022-12-27 00:21:15 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;283 words&nbsp;·&nbsp;Sukai Huang

Anindya_sarkar Reward Delay Attacks on Deep Reinforcement Learning 2022

[TOC] Title: Reward Delay Attacks on Deep Reinforcement Learning Author: Anindya Sarkar et. al. Publish Year: 8 Sep 2022 Review Date: Mon, Dec 26, 2022 Summary of paper Motivation we present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We evaluate the efficacy of the proposed attacks through a series of experiments. Contribution our first observation is that reward-delay attacks are extremely effective when the goal for the adversarial is simply to minimise reward....

<span title='2022-12-26 21:07:03 +1100 AEDT'>December 26, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;374 words&nbsp;·&nbsp;Sukai Huang

Tom_everitt Reinforcement Learning With a Corrupted Reward Channel 2017

[TOC] Title: Reinforcement Learning With a Corrupted Reward Channel Author: Tom Everitt Publish Year: August 22, 2017 Review Date: Mon, Dec 26, 2022 Summary of paper Motivation we formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards Contribution two ways around the problem are investigated....

<span title='2022-12-26 01:11:23 +1100 AEDT'>December 26, 2022</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;757 words&nbsp;·&nbsp;Sukai Huang