[TOC]
- Title: Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments
- Author: Amin Rakhsha et. al.
- Publish Year: 16 Feb 2021
- Review Date: Tue, Dec 27, 2022
Summary of paper
Motivation
- Our attack makes minimum assumptions on the prior knowledge of the environment or the learner’s learning algorithm.
- most of the prior work makes strong assumptions on the knowledge of adversary – it often assumed that the adversary has full knowledge of the environment or the agent’s learning algorithm or both.
- under such assumptions, attack strategies have been proposed that can mislead the agent to learn a nefarious policy with minimal perturbation to the rewards.
Contribution
- We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the SOTA white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting.
limitation
- this focuses on learning a nerfarious poilcy rather than slowing down the learning process.
Some key terms
online reward poisoning
- in online settings, reward poisoning is first introduced and studies in multi-armed bandits (ref: Data poisoning attacks in contextual bandits), where the authors show that adversarially perturbed reward can mislead standard bandit algorithm to pull a suboptimal arm or suffer large regret.
poisoning attacks and teaching
- poisoning attacks is mathematically equivalent to the formulation of machine teaching with the teacher being the adversary.
- however, these works only consider supervised learning settings