Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks
- URL: http://arxiv.org/abs/2212.14115v1
- Date: Wed, 28 Dec 2022 22:33:38 GMT
- Title: Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks
- Authors: Junlin Wu, Hussein Sibai and Yevgeniy Vorobeychik
- Abstract summary: We propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time.
We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL.
- Score: 23.907977144668838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Function approximation has enabled remarkable advances in applying
reinforcement learning (RL) techniques in environments with high-dimensional
inputs, such as images, in an end-to-end fashion, mapping such inputs directly
to low-level control. Nevertheless, these have proved vulnerable to small
adversarial input perturbations. A number of approaches for improving or
certifying robustness of end-to-end RL to adversarial perturbations have
emerged as a result, focusing on cumulative reward. However, what is often at
stake in adversarial scenarios is the violation of fundamental properties, such
as safety, rather than the overall reward that combines safety with efficiency.
Moreover, properties such as safety can only be defined with respect to true
state, rather than the high-dimensional raw inputs to end-to-end policies. To
disentangle nominal efficiency and adversarial safety, we situate RL in
deterministic partially-observable Markov decision processes (POMDPs) with the
goal of maximizing cumulative reward subject to safety constraints. We then
propose a partially-supervised reinforcement learning (PSRL) framework that
takes advantage of an additional assumption that the true state of the POMDP is
known at training time. We present the first approach for certifying safety of
PSRL policies under adversarial input perturbations, and two adversarial
training approaches that make direct use of PSRL. Our experiments demonstrate
both the efficacy of the proposed approach for certifying safety in adversarial
environments, and the value of the PSRL framework coupled with adversarial
training in improving certified safety while preserving high nominal reward and
high-quality predictions of true state.
Related papers
- Embedding Safety into RL: A New Take on Trust Region Methods [1.5733417396701983]
Reinforcement Learning (RL) agents are able to solve a wide variety of tasks but are prone to unsafe behaviors.
We propose Constrained Trust Region Policy Optimization (C-TRPO), a novel approach that modifies the geometry of the policy space based on the safety constraints.
arXiv Detail & Related papers (2024-11-05T09:55:50Z) - Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank [64.44255178199846]
We generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust CLTR.
We also propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior.
PRPO is the first method with unconditional safety in deployment that translates to robust safety for real-world applications.
arXiv Detail & Related papers (2024-07-29T12:23:59Z) - Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL)
This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL.
Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z) - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - CROP: Certifying Robust Policies for Reinforcement Learning through
Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations.
We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.