Amin_rakhsha Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments 2021

[TOC]

Title: Reward Poisoning in Reinforcement Learning Attacks Against Unknown Learners in Unknown Environments
Author: Amin Rakhsha et. al.
Publish Year: 16 Feb 2021
Review Date: Tue, Dec 27, 2022

Summary of paper

Our attack makes minimum assumptions on the prior knowledge of the environment or the learner’s learning algorithm.
most of the prior work makes strong assumptions on the knowledge of adversary – it often assumed that the adversary has full knowledge of the environment or the agent’s learning algorithm or both.
under such assumptions, attack strategies have been proposed that can mislead the agent to learn a nefarious policy with minimal perturbation to the rewards.

We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the SOTA white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting.

limitation

this focuses on learning a nerfarious poilcy rather than slowing down the learning process.

online reward poisoning

in online settings, reward poisoning is first introduced and studies in multi-armed bandits (ref: Data poisoning attacks in contextual bandits), where the authors show that adversarially perturbed reward can mislead standard bandit algorithm to pull a suboptimal arm or suffer large regret.

poisoning attacks and teaching

poisoning attacks is mathematically equivalent to the formulation of machine teaching with the teacher being the adversary.
however, these works only consider supervised learning settings