Sandy_huang Adversarial Attacks on Neural Network Policies 2017

[TOC]

Summary of paper

in this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies.

we characterise the degree of vulnerability across tasks and training algorithm, for a subclass of adversarial example attacks in white-box and black-box settings.
regardless of the learned tasks or training algorithm, we observed a significant drop in performance, even with small adversarial perturbations that do not interfere with human perceptions.
main contribution is to characterise how the efffectiveness of adversarial examples is impacted by two factors: the deep RL algorithm used to learn the policy and whether the adversary has access to the policy network

Limitation

adversarial example crafting with the fast gradient sign method

it is common to use a computationally cheap method of generating adversarial perturbations, even if this reduces the attack rate somewhat.
Fast Gradient Sign Method (FGSM)
- focuses on adversarial perturbations where each pixel of the input image is changed by no more than $\epsilon$, given a linear function $g(x) = w^T x$, the optimal adversarial perturbation $\eta$ that satisfies $||\eta||_\infty < \epsilon$
- $\eta = \epsilon \text{ sign}(w)$
In essence, FGSM is to add the noise (not random noise) whose direction is the same as the gradient of the cost function with respect to the data.
sign is the direction of the gradient, not the magnitude

Applying FGSM to Policies

Vulnerability to white box attacks

we find that regardless of which game the policy is trained for or how it is trained, it is indeed possible to significantly decrease the policy’s performance through introducing relatively small perturbation.
we see that policies trained with A3C, TRPO, and DQN are all susceptible to adversarial inputs. Interestingly, policies trained with DQN are more susceptible.