Danijar_hafner Mastering Diverse Domains Through World Models 2023

[TOC] Title: Mastering Diverse Domains Through World Models Author: Danijar Hafner et. al. Publish Year: 10 Jan 2023 Review Date: Tue, Feb 7, 2023 url: https://www.youtube.com/watch?v=vfpZu0R1s1Y Summary of paper Motivation general intelligence requires solving tasks across many domains. Current reinforcement learning algorithms carry this potential but held back by the resources and knowledge required tune them for new task. Contribution we present DreamerV3, a general and scalable algorithm based on world models that outperforms previous approaches across a wide range of domains with fixed hyperparameters. we observe favourable scaling properties of DreamerV3, with larger models directly translating to higher data-efficiency and final performance. Some key terms World Model learning ...

<span title='2023-02-07 18:18:37 +1100 AEDT'>February 7, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;291 words&nbsp;·&nbsp;Sukai Huang

Yuanhan_zhang What Makes Good Examples for Visual in Context Learning 2023

[TOC] Title: What Makes Good Examples for Visual in Context Learning Author: Yuan Zhang et. al. Publish Year: 1 Feb 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.13670.pdf Summary of paper Motivation in this paper, the main focus is on an emergent ability in large vision models, known. as in-context learning this concept has been well-known in natural language processing but has only been studied very recently for large vision models. Contribution we for the first time provide a comprehensive investigation on the impact of in-context examples in computer vision, and find that the performance is highly sensitive to the choice of in-context examples. exposing a critical issue that different in-context examples could lead to drastically different results. Our methods obtain significant improvements over random selection under various problem settings, showing the potential of using prompt retrieval in vision applications with a Model-as-a-Service (MaaS) business structure. we show that a good in-context example should be semantically similar to the query and closer in context. A model that can better balance spatial and se- mantic closedness in feature space would be more ideal for visual in-context learning. yeah, it is because the model is not that smart in a way that it can directly tell the semantic regardless of what the spatial structure looks like Some key terms existing issue of using LLM ...

<span title='2023-02-06 22:38:35 +1100 AEDT'>February 6, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;427 words&nbsp;·&nbsp;Sukai Huang

Jing_yu_koh Grounding Language Models to Images for Multimodal Generation 2023

[TOC] Title: Grounding Language Models to Images for Multimodal Generation Author: Jing Yu Koh et. al. Publish Year: 31 Jan 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.13823.pdf Summary of paper Motivation we propose an efficient method to ground pre-trained text-only language models to the visual domain How we keep the language model frozen, and finetune input and output linear layers to enable cross-modality interactions. This allows our model to process arbitrarily interleaved Contribution our approach works with any off-the-shelf language model and paves the way towards an effective, general solution for leveraging pre-trained language models in visually grounded settings. Related work LLMs for vision-and-language ...

<span title='2023-02-06 22:37:53 +1100 AEDT'>February 6, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;239 words&nbsp;·&nbsp;Sukai Huang

Zhenfang_chen See Think Confirm Interactive Prompting Between Vision and Language Models for Knowledge Based Visual Reasoning 2023

[TOC] Title: See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge Based Visual Reasoning Author: Zhenfang Chen et. al. Publish Year: 12 Jan 2023 Review Date: Mon, Feb 6, 2023 url: https://arxiv.org/pdf/2301.05226.pdf Summary of paper Motivation Solving the knowledge-based visual reasoning tasks remains challenging, which requires a model to comprehensively understand image content, connect external world knowledge, and perform step-by-step reasoning to answer the questions correctly. Contribution We propose a novel framework named Interactive Prompting Visual Reasoner (IPVR) for few-shot knowledge based visual reasoning. IPVR contains three stages, see, think, and confirm. The see stage scans the image and grounds the visual concept candidates with a visual perception model. The think stage adopts a pre-trained large language model (LLM) to attend the key concepts from candidates adaptively. It then transforms them into text context for prompting with a visual captioning model and adopts the LLM to generate the answer. The confirm stage further uses the LLM to generate the supporting rational to the answer, verify the generated rationale with a cross-modality classifier and ensure that the rationale can infer the predicted output consistently. Some key terms human process to handle knowledge-based visual reasoning ...

<span title='2023-02-06 22:36:41 +1100 AEDT'>February 6, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;405 words&nbsp;·&nbsp;Sukai Huang

Xiaotian_liu a Planning Based Neural Symbolic Approach for Embodied Instruction Following 2022

[TOC] Title: A Planning Based Neural Symbolic Approach for Embodied Instruction Following Author: Xiaotian Liu et. al. Publish Year: 2022 Review Date: Thu, Feb 2, 2023 url: https://embodied-ai.org/papers/2022/15.pdf Summary of paper Motivation end-to-end deep learning methods struggle at these tasks due to long-horizon and sparse rewards. Contribution Our main innovation relies on combining DL models for perception and NLP with a new egocentric planner based on successive planning problems formulated using the PDDL syntax, both for exploration and task accomplishment. our planning framework can naturally recover from action failures at any stage of the planned trajectory. Some key terms Embodied Instruction Following ...

<span title='2023-02-02 13:28:19 +1100 AEDT'>February 2, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;226 words&nbsp;·&nbsp;Sukai Huang

So_yeon_min Film Following Instructions in Language With Modular Methods 2022

[TOC] Title: FILM: Following Instructions in Language With Modular Methods Author: So Yeon Min et. al. Publish Year: 16 Mar 2022 Review Date: Wed, Feb 1, 2023 url: https://arxiv.org/pdf/2110.07342.pdf Summary of paper Motivation current approaches assume that neural states will integrate multimodal semantics to perform state tracking, building spatial memory, exploration, and long-term planning. in contrast, we propose a modular method with structured representation that build a semantic map of scene and perform exploration with a semantic search policy, to achieve natural language goal. Contribution FILM consists of several modular components that each processes language instructions into structured forms (language processing) converts egocentric visual input into a semantic metric map (Semantic Mapping) predicts a search goal location (Semantic Search Policy) ? subgoal will be plotted as a dot on the semantic top-down map outputs subsequent navigation/interaction actions (Deterministic Policy) Some key terms embodied instruction following ...

<span title='2023-02-01 18:32:24 +1100 AEDT'>February 1, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;430 words&nbsp;·&nbsp;Sukai Huang

Yuki_inoue Prompter Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following 2022

[TOC] Title: Prompter: Utilizing Large Language Model Prompting for a Data Efficient Embodied Instruction Following Author: Yuki Inoue et. al. Publish Year: 7 Nov 2022 Review Date: Wed, Feb 1, 2023 url: https://arxiv.org/pdf/2211.03267.pdf Summary of paper Motivation we propose FILM++ which extends the existing work FILM with modifications that do not require extra data. furthermore, we propose Prompter, which replace FILM++’s semantic search module with language model prompting. no training is needed for our prompting based implementation while achieving better or least comparable performance. Contribution FILM++ to fill the role of the data efficient baseline. we propose Prompter, which replaces the semantic search module of FILM++ with language prompting, making it even more data efficient. Some key terms Difficulty in converting language into robot controls ...

<span title='2023-02-01 17:22:35 +1100 AEDT'>February 1, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;526 words&nbsp;·&nbsp;Sukai Huang

Kyle_mahowald Dissociating Language and Thought in Large Language Models a Cognitive Perspective 2023

[TOC] Title: Dissociating Language and Thought in Large Language Models a Cognitive Perspective Author: Kyle Mahowald et. al. Publish Year: 16 Jan 2023 Review Date: Tue, Jan 31, 2023 url: https://arxiv.org/pdf/2301.06627.pdf Summary of paper Motivation the author tried to challenge the “good at language $\implies$ good at thought” fallacy. the second fallacy is “bad at thought $\implies$ bad at language” Contribution the author argued that LLMs have promise as scientific models of one piece of the human cognitive toolbox – formal language processing – but fall short of modelling human thought. in section 4, we consider several domains required for functional linguistic competence – formal reasoning, world knowledge, situation modelling and social cognitive abilities Some key terms deep learning models in linguistics ...

<span title='2023-01-31 18:47:45 +1100 AEDT'>January 31, 2023</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;776 words&nbsp;·&nbsp;Sukai Huang

Michael_janner Planning With Diffusion for Flexible Behaviour Synthesis 2022

[TOC] Title: Planning With Diffusion for Flexible Behaviour Synthesis Author: Michael Janner et. al. Publish Year: 21 Dec 2022 Review Date: Mon, Jan 30, 2023 Summary of paper Motivation use the diffusion model to learn the dynamics tight coupling of the modelling and planning our goal is to break this abstraction barrier by designing a model and planning algorithm that are trained alongside one another, resulting in a non-autoregressive trajectory-level model for which sampling and planning are nearly identical. Some key terms ideal model-based RL ...

<span title='2023-01-30 13:43:20 +1100 AEDT'>January 30, 2023</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;317 words&nbsp;·&nbsp;Sukai Huang

Shailaja_keyur_sampat Reasoning About Actions Over Visual and Linguistic Modalities a Survey 2022

[TOC] Title: Shailaja_keyur_sampat Reasoning About Actions Over Visual and Linguistic Modalities a Survey 2022 Author: Publish Year: Review Date: Fri, Jan 20, 2023 Summary of paper Motivation reasoning about actions & changes has been widely studies in the knowledge representation community, it has recently piqued the interest of NLP and computer vision researchers. Contribution Some key terms Six most frequent types of commonsense knowledge tasks that involve language-based reasoning about actions ...

<span title='2023-01-20 13:59:00 +1100 AEDT'>January 20, 2023</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;524 words&nbsp;·&nbsp;Sukai Huang

Xin_wang Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019

[TOC] Title: Reinforced Cross Modal Matching and Self Supervised Imitation Learning for Vision Language Navigation 2019 Author: Xin Wang et. al. Publish Year: Review Date: Wed, Jan 18, 2023 Summary of paper Motivation Visual Language Navigation (VLN) presents some unique challenges first, reasoning over images and natural language instructions can be difficult. secondly, except for strictly following expert demonstrations, the feedback is rather coarse, since the “Success” feedback is provided only when the agent reaches a target position (sparse reward) A good “instruction following” trajectory may ended up just stop before you reaching the goal state and then receive zero rewards. existing work suffer from generalisation problem. (need to retrain the agent in new environment) Implementation agent can infer which sub-instruction to focus on and where to look at. (automatic splitting long instruction) with a matching critic that evaluates an executed path by the probability of reconstructing the original instruction from the executed path. P(original instruction | past trajectory) cycle reconstruction: we have P(target trajectory | the instruction) = 1, and we want to measure P(original instruction | past trajectory) this will enhance the interpretability as now you understand how the robot was thinking about

<span title='2023-01-18 09:48:14 +1100 AEDT'>January 18, 2023</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;195 words&nbsp;·&nbsp;Sukai Huang

Alekh_agarwal PC-PG Policy Cover Directed Exploration for Provable Policy Gradient Learning 2020

[TOC] Title: PC-PG Policy Cover Directed Exploration for Provable Policy Gradient Learning Author: Alekh Agarwal et. al. Publish Year: Review Date: Wed, Dec 28, 2022 Summary of paper Motivation The primary drawback of direct policy gradient methods is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approach and Q-learning directly handle exploration through the use of optimism. Contribution Policy Cover-Policy Gradient algorithm (PC-PG), a direct, model-free, policy optimisation approach which addresses exploration through the use of a learned ensemble of policies, the latter provides a policy cover over the state space. the use of a learned policy cover address exploration, and also address what is the catastrophic forgetting problem in policy gradient approaches (which use reward bonuses); the on-policy algorithm, where approximation errors due to model mispecification amplify (see [Lu et al., 2018] for discussion) Some key terms suffering from sparse reward ...

<span title='2022-12-28 14:39:25 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;271 words&nbsp;·&nbsp;Sukai Huang

Alekh_agarwal on the Theory of Policy Gradient Methods Optimality Approximation and Distribution Shift 2020

[TOC] Title: On the Theory of Policy Gradient Methods Optimality Approximation and Distribution Shift 2020 Author: Alekh Agarwal et. al. Publish Year: 14 Oct 2020 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution and how they cope with approximation error due to using a restricted class of parametric policies. Contribution One central contribution of this work is in providing approximation guarantees that are average case - which avoid explicit worst-case dependencies on the size of state space – by making a formal connection to supervised learning under distribution shift. This characterisation shows an important between estimation error, approximation error and exploration (as characterised through a precisely defined condition number) Some key terms basic theoretical convergence questions ...

<span title='2022-12-28 14:36:20 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;557 words&nbsp;·&nbsp;Sukai Huang

Chloe_ching_yun_hsu Revisiting Design Choices in Proximal Policy Optimisation 2020

[TOC] Title: Revisiting Design Choices in Proximal Policy Optimisation Author: Chloe Ching-Yun Hsu et. al. Publish Year: 23 Sep 2020 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation Contribution on discrete action space with sparse high rewards, standard PPO often gets stuck at suboptimal actions. Why analyze the reason fort these failure modes and explain why they are not exposed by standard benchmarks In summary, our study suggests that Beta policy parameterization and KL-regularized objectives should be reconsidered for PPO, especially when alternatives improves PPO in all settings. The author proved the convergence guarantee for PPO-KL penalty version, as it inherits convergence guarantees of mirror descent for policy families that are closed under mixture Some key terms design choices ...

<span title='2022-12-28 14:32:15 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;467 words&nbsp;·&nbsp;Sukai Huang

James_queeney Generalized Proximal Policy Optimisation With Sample Reuse 2021

[TOC] Title: Generalized Proximal Policy Optimisation With Sample Reuse 2021 Author: James Queeney et. al. Publish Year: 29 Oct 2021 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. Contribution in this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for off-policy setting, and connect these bounds to the clipping mechanism used in PPO this motivate an off-policy version of the popular algorithm that we call GePPO. we demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the competing goals of stability and sample efficiency Some key terms sample complexity ...

<span title='2022-12-28 14:00:32 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;5 min&nbsp;·&nbsp;1033 words&nbsp;·&nbsp;Sukai Huang

Lun_wang Backdoorl Backdoor Attack Against Competitive Reinforcement Learning 2021

[TOC] Title: BackdooRL Backdoor Attack Against Competitive Reinforcement Learning 2021 Author: Lun Wang et. al Publish Year: 12 Dec 2021 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation in this paper, we propose BACKDOORL, a backdoor attack targeted at two player competitive reinforcement learning systems. first the adversary agent has to lead the victim to take a series of wrong actions instead of only one to prevent it from winning. Additionally, the adversary wants to exhibit the trigger action in as few steps as possible to avoid detection. Contribution we propose backdoorl, the first backdoor attack targeted at competitive reinforcement learning systems. The trigger is the action of another agent in the environment. We propose a unified method to design fast-failing agent for different environment We prototype BACKDOORL and evaluate it in four environments. The results validate the feasibility of backdoor attacks in competitive environment We study the possible defenses for backdoorl. The results show that fine-tuning cannot completely remove the backdoor. Some key terms backdoorl workflow ...

<span title='2022-12-28 03:57:59 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;202 words&nbsp;·&nbsp;Sukai Huang

Sandy_huang Adversarial Attacks on Neural Network Policies 2017

[TOC] Title: Adversarial Attacks on Neural Network Policies Author: Sandy Huang et. al. Publish Year: 8 Feb 2017 Review Date: Wed, Dec 28, 2022 Summary of paper Motivation in this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies. Contribution we characterise the degree of vulnerability across tasks and training algorithm, for a subclass of adversarial example attacks in white-box and black-box settings. ...

<span title='2022-12-28 00:08:22 +1100 AEDT'>December 28, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;346 words&nbsp;·&nbsp;Sukai Huang

Yinglun_xu Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning 2022

[TOC] Title: Efficient Reward Poisoning Attacks on Online Deep Reinforcement Learning Author: Yinglun Xu et. al. Publish Year: 30 May 2022 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation we study data poisoning attacks on online deep reinforcement learning (DRL) where the attacker is oblivious to the learning algorithm used by the agent and does not necessarily have full knowledge of the environment. we instantiate our framework to construct several attacks which only corrupts the rewards for a small fraction of the total training timesteps and make the agent learn a low performing policy Contribution result show that the reward attack efficiently poison agent learning with a variety of SOTA DRL algorithm such as DQN, PPO our attack can work on model-free DRL algorithm for all popular learning paradigms, and only assume the learning algorithm to be efficient. large enough reward poisoning attack in the right direction is able to disrupt the DRL algorithm. limitation ...

<span title='2022-12-27 23:14:19 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;302 words&nbsp;·&nbsp;Sukai Huang

Young_wu Reward Poisoning Attacks on Offline Multi Agent Reinforcement Learning 2022

[TOC] Title: Reward Poisoning Attacks on Offline Multi Agent Reinforcement Learning Author: Young Wu et. al. Publish Year: 1 Dec 2022 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Contribution unlike attacks on single-agent RL, we show that the attacker can install the target poilcy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. Limitation ...

<span title='2022-12-27 22:50:14 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;1 min&nbsp;·&nbsp;146 words&nbsp;·&nbsp;Sukai Huang

Xuezhou_zhang Robust Policy Gradient Against Strong Data Corruption 2021

[TOC] Title: Robust Policy Gradient Against Strong Data Corruption Author: Xuezhou Zhang et. al. Publish Year: 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Abstract Contribution the author utilised a SVD-denoising technique to identify and remove the possible reward perturbations this approach gives a robust RL algorithm Limitation This approach only solve the attack perturbation that is not consistent. (i.e. not stealthy) Some key terms Policy gradient methods ...

<span title='2022-12-27 20:35:10 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;317 words&nbsp;·&nbsp;Sukai Huang

Kiarash_banihashem Defense Against Reward Poisoning Attacks in Reinforcement Learning 2021

[TOC] Title: Defense Against Reward Poisoning Attacks in Reinforcement Learning Author: Kiarash Banihashem et. al. Publish Year: 20 Jun 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r.t. the true unpoisoned rewards while computing their policies under the poisoned rewards. Contribution we formalise this reasoning and characterize the utility of our novel framework for designing defense policies. In summary, the key contributions include ...

<span title='2022-12-27 18:27:17 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;303 words&nbsp;·&nbsp;Sukai Huang

Amin_rakhsha Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments 2021

[TOC] Title: Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments Author: Amin Rakhsha et. al. Publish Year: 16 Feb 2021 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Our attack makes minimum assumptions on the prior knowledge of the environment or the learner’s learning algorithm. most of the prior work makes strong assumptions on the knowledge of adversary – it often assumed that the adversary has full knowledge of the environment or the agent’s learning algorithm or both. under such assumptions, attack strategies have been proposed that can mislead the agent to learn a nefarious policy with minimal perturbation to the rewards. Contribution We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the SOTA white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting. limitation ...

<span title='2022-12-27 15:50:22 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;233 words&nbsp;·&nbsp;Sukai Huang

Xuezhou_zhang Adaptive Reward Poisoning Attacks Against Reinforcement Learning 2020

[TOC] Title: Adaptive Reward Poisoning Attacks Against Reinforcement Learning Author: Xuezhou Zhang et. al. Publish Year: 22 Jun, 2020 Review Date: Tue, Dec 27, 2022 Summary of paper Motivation Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$ whereas non-adaptive attacks require exponential steps. Contribution we provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe. similar to this paper, it shows that reward attack has its limit we provide a corresponding upper threshold above which the attack is feasible. we characterise conditions under which such attacks are guaranteed to fail (thus RL is safe), and vice versa in the case where attack is feasible, we provide upper bounds on the attack cost in the processing of achieving bad poliy we show that effective attacks can be found empirically using deep RL techniques. Some key terms feasible attack category ...

<span title='2022-12-27 00:21:15 +1100 AEDT'>December 27, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;283 words&nbsp;·&nbsp;Sukai Huang

Anindya_sarkar Reward Delay Attacks on Deep Reinforcement Learning 2022

[TOC] Title: Reward Delay Attacks on Deep Reinforcement Learning Author: Anindya Sarkar et. al. Publish Year: 8 Sep 2022 Review Date: Mon, Dec 26, 2022 Summary of paper Motivation we present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We evaluate the efficacy of the proposed attacks through a series of experiments. Contribution our first observation is that reward-delay attacks are extremely effective when the goal for the adversarial is simply to minimise reward. we find that some mitigation method remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards. Conclusion ...

<span title='2022-12-26 21:07:03 +1100 AEDT'>December 26, 2022</span>&nbsp;·&nbsp;2 min&nbsp;·&nbsp;374 words&nbsp;·&nbsp;Sukai Huang

Tom_everitt Reinforcement Learning With a Corrupted Reward Channel 2017

[TOC] Title: Reinforcement Learning With a Corrupted Reward Channel Author: Tom Everitt Publish Year: August 22, 2017 Review Date: Mon, Dec 26, 2022 Summary of paper Motivation we formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards Contribution two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed second, by using randomisation to blunt the agent’s optimisation, reward corruption can be partially managed under some assumption Limitation ...

<span title='2022-12-26 01:11:23 +1100 AEDT'>December 26, 2022</span>&nbsp;·&nbsp;4 min&nbsp;·&nbsp;757 words&nbsp;·&nbsp;Sukai Huang