[TOC]

  1. Title: BackdooRL Backdoor Attack Against Competitive Reinforcement Learning 2021
  2. Author: Lun Wang et. al
  3. Publish Year: 12 Dec 2021
  4. Review Date: Wed, Dec 28, 2022

Summary of paper

Motivation

  • in this paper, we propose BACKDOORL, a backdoor attack targeted at two player competitive reinforcement learning systems.
  • first the adversary agent has to lead the victim to take a series of wrong actions instead of only one to prevent it from winning.
  • Additionally, the adversary wants to exhibit the trigger action in as few steps as possible to avoid detection.

Contribution

  • we propose backdoorl, the first backdoor attack targeted at competitive reinforcement learning systems. The trigger is the action of another agent in the environment.
  • We propose a unified method to design fast-failing agent for different environment
  • We prototype BACKDOORL and evaluate it in four environments. The results validate the feasibility of backdoor attacks in competitive environment
  • We study the possible defenses for backdoorl. The results show that fine-tuning cannot completely remove the backdoor.

Some key terms

backdoorl workflow

image-20221228041516138

Defense

  1. one possible defense is to fine-tune (or un-learn) the victim network by retraining with additional normal episodes.
  2. Additionally, we notice that even fine-tuning for more epochs cannot further improve the winning rate.