Lun_wang Backdoorl Backdoor Attack Against Competitive Reinforcement Learning 2021

[TOC]

Title: BackdooRL Backdoor Attack Against Competitive Reinforcement Learning 2021
Author: Lun Wang et. al
Publish Year: 12 Dec 2021
Review Date: Wed, Dec 28, 2022

Summary of paper

in this paper, we propose BACKDOORL, a backdoor attack targeted at two player competitive reinforcement learning systems.
first the adversary agent has to lead the victim to take a series of wrong actions instead of only one to prevent it from winning.
Additionally, the adversary wants to exhibit the trigger action in as few steps as possible to avoid detection.

we propose backdoorl, the first backdoor attack targeted at competitive reinforcement learning systems. The trigger is the action of another agent in the environment.
We propose a unified method to design fast-failing agent for different environment
We prototype BACKDOORL and evaluate it in four environments. The results validate the feasibility of backdoor attacks in competitive environment
We study the possible defenses for backdoorl. The results show that fine-tuning cannot completely remove the backdoor.

backdoorl workflow

Defense

one possible defense is to fine-tune (or un-learn) the victim network by retraining with additional normal episodes.
Additionally, we notice that even fine-tuning for more epochs cannot further improve the winning rate.