[TOC]

  1. Title: Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments
  2. Author: Amin Rakhsha et. al.
  3. Publish Year: 16 Feb 2021
  4. Review Date: Tue, Dec 27, 2022

Summary of paper

Motivation

  • Our attack makes minimum assumptions on the prior knowledge of the environment or the learner’s learning algorithm.
  • most of the prior work makes strong assumptions on the knowledge of adversary – it often assumed that the adversary has full knowledge of the environment or the agent’s learning algorithm or both.
  • under such assumptions, attack strategies have been proposed that can mislead the agent to learn a nefarious policy with minimal perturbation to the rewards.

Contribution

  • We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the SOTA white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting.

limitation

  • this focuses on learning a nerfarious poilcy rather than slowing down the learning process.

Some key terms

online reward poisoning

  • in online settings, reward poisoning is first introduced and studies in multi-armed bandits (ref: Data poisoning attacks in contextual bandits), where the authors show that adversarially perturbed reward can mislead standard bandit algorithm to pull a suboptimal arm or suffer large regret.

poisoning attacks and teaching

  • poisoning attacks is mathematically equivalent to the formulation of machine teaching with the teacher being the adversary.
  • however, these works only consider supervised learning settings