Zuxin_liu on the Robustness of Safe Reinforcement Learning Under Observational Perturbations 2022

[TOC]

Title: On the Robustness of Safe Reinforcement Learning Under Observational Perturbations
Author: Zuxin Liu et. al.
Publish Year: 3 Oct 2022
Review Date: Thu, Dec 22, 2022

Summary of paper

While many recent safe RL methods with deep policies can achieve outstanding constraint satisfaction in noise-free simulation environment, such a concern regarding their vulnerability under adversarial perturbation has not been studies in the safe RL setting.

we are the first to formally analyze the unique vulnerability of the optimal policy in safe RL under observational corruptions. We define the state-adversarial safe RL problem and investigate its fundamental properties. We show that optimal solutions of safe RL problems are theoretically vulnerable under observational adversarial attacks
we show that existing adversarial attack algorithms focusing on minimizing agent rewards do not always work, and propose two effective attack algorithms with theoretical justifications – one directly maximise the constraint violation cost, and one maximise the task reward to induce a tempting but risky policy.
- Surprisingly, the maximum reward attack is very strong in inducing unsafe behaviors, both in theory and practice
we propose an adversarial training algorithm with the proposed attackers and show contraction properties of their Bellman operators. Extensive experiments in continuous control tasks show that our method is more robust against adversarial perturbations in terms of constraint satisfaction.

Safe reinforcement learning definition

SRL tackles the problem by solving constrained optimisation that can maximise the task reward while satisfying certain constraints.
this is usually done under the Constrained MDP framework and has shown to be effective in learning a constraint satisfaction policy in many tasks.
safe RL has an additional metric that characterises the cost of constraint violations.
there are some cases where sacrificing some reward is not comparable with violating the constraint because the latter may cause catastrophic consequences.

make the attack stealthy

keep the reward as high as possible but aims to generate more constraint violations.
in contrast, existing adversaries on standard RL aims to reduce the overall reward or lead to incorrect decision-making

reward stealthiness

is defined as the increased reward value under the adversary. An state-adversary v is stealthy if the reward it can obtain is higher than the original one.

reward effectiveness

the effectiveness metric measures an adversary’s capability of attacking the safe RL agent to violate constraints. i.e., the increased cost value under the adversary

adversary

Proposition 1
- for an optimal policy pi*, the Maximum Reward Attacker is guaranteed to be reward steathy and effective, given enough large perturbation set.

Citation

it has been shown that neural networks are vulnerable to adversarial attacks – a small perturbation of the input data may lead to a large variance of the output
- ref: Gabriel Resende Machado, Eugênio Silva, and Ronaldo Ribeiro Goldschmidt. Adversarial machine learning in image classification: A survey toward the defender’s perspective. ACM Computing Surveys (CSUR), 55(1):1–38, 2021
- Nikolaos Pitropakis, Emmanouil Panaousis, Thanassis Giannetsos, Eleftherios Anastasiadis, and George Loukas. A taxonomy and survey of attacks against machine learning. Computer Science Review, 34:100199, 2019.

limitation

this paper only considered the observation perturbation instead of the reward perturbation.