Policy Smoothing for Provably Robust Reinforcement Learning
- URL: http://arxiv.org/abs/2106.11420v1
- Date: Mon, 21 Jun 2021 21:42:08 GMT
- Title: Policy Smoothing for Provably Robust Reinforcement Learning
- Authors: Aounon Kumar, Alexander Levine and Soheil Feizi
- Abstract summary: We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
- Score: 109.90239627115336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The study of provable adversarial robustness for deep neural network (DNN)
models has mainly focused on static supervised learning tasks such as image
classification. However, DNNs have been used extensively in real-world adaptive
tasks such as reinforcement learning (RL), making RL systems vulnerable to
adversarial attacks. The key challenge in adversarial RL is that the attacker
can adapt itself to the defense strategy used by the agent in previous
time-steps to strengthen its attack in future steps. In this work, we study the
provable robustness of RL against norm-bounded adversarial perturbations of the
inputs. We focus on smoothing-based provable defenses and propose policy
smoothing where the agent adds a Gaussian noise to its observation at each
time-step before applying the policy network to make itself less sensitive to
adversarial perturbations of its inputs. Our main theoretical contribution is
to prove an adaptive version of the Neyman-Pearson Lemma where the adversarial
perturbation at a particular time can be a stochastic function of current and
previous observations and states as well as previously observed actions. Using
this lemma, we adapt the robustness certificates produced by randomized
smoothing in the static setting of image classification to the dynamic setting
of RL. We generate certificates that guarantee that the total reward obtained
by the smoothed policy will not fall below a certain threshold under a
norm-bounded adversarial perturbation of the input. We show that our
certificates are tight by constructing a worst-case setting that achieves the
bounds derived in our analysis. In our experiments, we show that this method
can yield meaningful certificates in complex environments demonstrating its
effectiveness against adversarial attacks.
Related papers
- The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks [23.907977144668838]
We propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time.
We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL.
arXiv Detail & Related papers (2022-12-28T22:33:38Z) - CROP: Certifying Robust Policies for Reinforcement Learning through
Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations.
We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Robust Reinforcement Learning on State Observations with Learned Optimal
Adversary [86.0846119254031]
We study the robustness of reinforcement learning with adversarially perturbed state observations.
With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found.
For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones.
arXiv Detail & Related papers (2021-01-21T05:38:52Z) - Robust Reinforcement Learning using Adversarial Populations [118.73193330231163]
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness.
We show that using a single adversary does not consistently yield robustness to dynamics variations under standard parametrizations of the adversary.
We propose a population-based augmentation to the Robust RL formulation in which we randomly initialize a population of adversaries and sample from the population uniformly during training.
arXiv Detail & Related papers (2020-08-04T20:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.