Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks
- URL: http://arxiv.org/abs/2212.14115v1
- Date: Wed, 28 Dec 2022 22:33:38 GMT
- Title: Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks
- Authors: Junlin Wu, Hussein Sibai and Yevgeniy Vorobeychik
- Abstract summary: We propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time.
We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL.
- Score: 23.907977144668838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Function approximation has enabled remarkable advances in applying
reinforcement learning (RL) techniques in environments with high-dimensional
inputs, such as images, in an end-to-end fashion, mapping such inputs directly
to low-level control. Nevertheless, these have proved vulnerable to small
adversarial input perturbations. A number of approaches for improving or
certifying robustness of end-to-end RL to adversarial perturbations have
emerged as a result, focusing on cumulative reward. However, what is often at
stake in adversarial scenarios is the violation of fundamental properties, such
as safety, rather than the overall reward that combines safety with efficiency.
Moreover, properties such as safety can only be defined with respect to true
state, rather than the high-dimensional raw inputs to end-to-end policies. To
disentangle nominal efficiency and adversarial safety, we situate RL in
deterministic partially-observable Markov decision processes (POMDPs) with the
goal of maximizing cumulative reward subject to safety constraints. We then
propose a partially-supervised reinforcement learning (PSRL) framework that
takes advantage of an additional assumption that the true state of the POMDP is
known at training time. We present the first approach for certifying safety of
PSRL policies under adversarial input perturbations, and two adversarial
training approaches that make direct use of PSRL. Our experiments demonstrate
both the efficacy of the proposed approach for certifying safety in adversarial
environments, and the value of the PSRL framework coupled with adversarial
training in improving certified safety while preserving high nominal reward and
high-quality predictions of true state.
Related papers
- Tilted Quantile Gradient Updates for Quantile-Constrained Reinforcement Learning [12.721239079824622]
We propose a safe reinforcement learning (RL) paradigm that enables a higher level of safety without any expectation-form approximations.
A tilted update strategy for quantile gradients is implemented to compensate the asymmetric distributional density.
Experiments demonstrate that the proposed model fully meets safety requirements (quantile constraints) while outperforming the state-of-the-art benchmarks with higher return.
arXiv Detail & Related papers (2024-12-17T18:58:00Z) - Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning [7.888219789657414]
In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints.
We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders.
We frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints.
arXiv Detail & Related papers (2024-12-11T22:00:07Z) - Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank [64.44255178199846]
We generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust CLTR.
We also propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior.
PRPO is the first method with unconditional safety in deployment that translates to robust safety for real-world applications.
arXiv Detail & Related papers (2024-07-29T12:23:59Z) - Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL)
This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL.
Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z) - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.