Investigating Vulnerabilities of Deep Neural Policies
- URL: http://arxiv.org/abs/2108.13093v1
- Date: Mon, 30 Aug 2021 10:04:50 GMT
- Title: Investigating Vulnerabilities of Deep Neural Policies
- Authors: Ezgi Korkmaz
- Abstract summary: Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs.
Recent work has proposed several methods to improve the robustness of deep reinforcement learning agents to adversarial perturbations.
We study the effects of adversarial training on the neural policy learned by the agent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning policies based on deep neural networks are vulnerable
to imperceptible adversarial perturbations to their inputs, in much the same
way as neural network image classifiers. Recent work has proposed several
methods to improve the robustness of deep reinforcement learning agents to
adversarial perturbations based on training in the presence of these
imperceptible perturbations (i.e. adversarial training). In this paper, we
study the effects of adversarial training on the neural policy learned by the
agent. In particular, we follow two distinct parallel approaches to investigate
the outcomes of adversarial training on deep neural policies based on
worst-case distributional shift and feature sensitivity. For the first
approach, we compare the Fourier spectrum of minimal perturbations computed for
both adversarially trained and vanilla trained neural policies. Via experiments
in the OpenAI Atari environments we show that minimal perturbations computed
for adversarially trained policies are more focused on lower frequencies in the
Fourier domain, indicating a higher sensitivity of these policies to low
frequency perturbations. For the second approach, we propose a novel method to
measure the feature sensitivities of deep neural policies and we compare these
feature sensitivity differences in state-of-the-art adversarially trained deep
neural policies and vanilla trained deep neural policies. We believe our
results can be an initial step towards understanding the relationship between
adversarial training and different notions of robustness for neural policies.
Related papers
- Understanding and Diagnosing Deep Reinforcement Learning [14.141453107129403]
Deep neural policies have recently been installed in a diverse range of settings, from biotechnology to automated financial systems.
We introduce a theoretically founded technique that provides a systematic analysis of the directions in the deep neural policy decision decision both time and space.
arXiv Detail & Related papers (2024-06-23T18:10:16Z) - Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks.
We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations.
We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z) - Adversarial Training Should Be Cast as a Non-Zero-Sum Game [121.95628660889628]
Two-player zero-sum paradigm of adversarial training has not engendered sufficient levels of robustness.
We show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on robustness.
A novel non-zero-sum bilevel formulation of adversarial training yields a framework that matches and in some cases outperforms state-of-the-art attacks.
arXiv Detail & Related papers (2023-06-19T16:00:48Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - Adversarial Robustness in Deep Learning: Attacks on Fragile Neurons [0.6899744489931016]
We identify fragile and robust neurons of deep learning architectures using nodal dropouts of the first convolutional layer.
We correlate these neurons with the distribution of adversarial attacks on the network.
arXiv Detail & Related papers (2022-01-31T14:34:07Z) - Deep Reinforcement Learning Policies Learn Shared Adversarial Features
Across MDPs [0.0]
We propose a framework to investigate the decision boundary and loss landscape similarities across states and across MDPs.
We conduct experiments in various games from Arcade Learning Environment, and discover that high sensitivity directions for neural policies are correlated across MDPs.
arXiv Detail & Related papers (2021-12-16T17:10:41Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z) - Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve
Adversarial Robustness [79.47619798416194]
Learn2Perturb is an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks.
Inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively.
arXiv Detail & Related papers (2020-03-02T18:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.