Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks
- URL: http://arxiv.org/abs/2212.14115v1
- Date: Wed, 28 Dec 2022 22:33:38 GMT
- Title: Certifying Safety in Reinforcement Learning under Adversarial
Perturbation Attacks
- Authors: Junlin Wu, Hussein Sibai and Yevgeniy Vorobeychik
- Abstract summary: We propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time.
We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL.
- Score: 23.907977144668838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Function approximation has enabled remarkable advances in applying
reinforcement learning (RL) techniques in environments with high-dimensional
inputs, such as images, in an end-to-end fashion, mapping such inputs directly
to low-level control. Nevertheless, these have proved vulnerable to small
adversarial input perturbations. A number of approaches for improving or
certifying robustness of end-to-end RL to adversarial perturbations have
emerged as a result, focusing on cumulative reward. However, what is often at
stake in adversarial scenarios is the violation of fundamental properties, such
as safety, rather than the overall reward that combines safety with efficiency.
Moreover, properties such as safety can only be defined with respect to true
state, rather than the high-dimensional raw inputs to end-to-end policies. To
disentangle nominal efficiency and adversarial safety, we situate RL in
deterministic partially-observable Markov decision processes (POMDPs) with the
goal of maximizing cumulative reward subject to safety constraints. We then
propose a partially-supervised reinforcement learning (PSRL) framework that
takes advantage of an additional assumption that the true state of the POMDP is
known at training time. We present the first approach for certifying safety of
PSRL policies under adversarial input perturbations, and two adversarial
training approaches that make direct use of PSRL. Our experiments demonstrate
both the efficacy of the proposed approach for certifying safety in adversarial
environments, and the value of the PSRL framework coupled with adversarial
training in improving certified safety while preserving high nominal reward and
high-quality predictions of true state.
Related papers
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL)
This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL.
Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z) - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques.
We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - CROP: Certifying Robust Policies for Reinforcement Learning through
Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations.
We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Assured Learning-enabled Autonomy: A Metacognitive Reinforcement
Learning Framework [4.427447378048202]
Reinforcement learning (RL) agents with pre-specified reward functions cannot provide guaranteed safety across variety of circumstances.
An assured autonomous control framework is presented in this paper by empowering RL algorithms with metacognitive learning capabilities.
arXiv Detail & Related papers (2021-03-23T14:01:35Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.