Related papers: Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

URL: http://arxiv.org/abs/2212.14115v1
Date: Wed, 28 Dec 2022 22:33:38 GMT
Title: Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks
Authors: Junlin Wu, Hussein Sibai and Yevgeniy Vorobeychik
Abstract summary: We propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL.
Score: 23.907977144668838
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.

Related papers

Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL [14.767273209148545]
We propose a novel framework Feasibility-Aware offline Safe Reinforcement Learning with CVAE-based Pessimism (FASP)<n>We employ Hamilton-Jacobi (H-J) reachability analysis to generate reliable safety labels.<n>We also utilize pessimistic estimation methods to estimate the Q-value of reward and cost.
arXiv Detail & Related papers (2025-05-13T02:32:49Z)
Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
Improving LLM Safety Alignment with Dual-Objective Optimization [65.41451412400609]
Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks. We propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge.
arXiv Detail & Related papers (2025-03-05T18:01:05Z)
Tilted Quantile Gradient Updates for Quantile-Constrained Reinforcement Learning [12.721239079824622]
We propose a safe reinforcement learning (RL) paradigm that enables a higher level of safety without any expectation-form approximations. A tilted update strategy for quantile gradients is implemented to compensate the asymmetric distributional density. Experiments demonstrate that the proposed model fully meets safety requirements (quantile constraints) while outperforming the state-of-the-art benchmarks with higher return.
arXiv Detail & Related papers (2024-12-17T18:58:00Z)
Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning [7.888219789657414]
In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders. We frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints.
arXiv Detail & Related papers (2024-12-11T22:00:07Z)
Embedding Safety into RL: A New Take on Trust Region Methods [1.5733417396701983]
Reinforcement Learning (RL) agents are able to solve a wide variety of tasks but are prone to unsafe behaviors. We propose Constrained Trust Region Policy Optimization (C-TRPO), a novel approach that modifies the geometry of the policy space based on the safety constraints.
arXiv Detail & Related papers (2024-11-05T09:55:50Z)
Practical and Robust Safety Guarantees for Advanced Counterfactual Learning to Rank [64.44255178199846]
We generalize the existing safe CLTR approach to make it applicable to state-of-the-art doubly robust CLTR. We also propose a novel approach, proximal ranking policy optimization (PRPO), that provides safety in deployment without assumptions about user behavior. PRPO is the first method with unconditional safety in deployment that translates to robust safety for real-world applications.
arXiv Detail & Related papers (2024-07-29T12:23:59Z)
Feasibility Consistent Representation Learning for Safe Reinforcement Learning [25.258227763316228]
We introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL) This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.
arXiv Detail & Related papers (2024-05-20T01:37:21Z)
The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks. A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z)
Safe Reinforcement Learning via Confidence-Based Filters [78.39359694273575]
We develop a control-theoretic approach for certifying state safety constraints for nominal policies learned via standard reinforcement learning techniques. We provide formal safety guarantees, and empirically demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-07-04T11:43:23Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z)
Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL) We learn a conservative safety estimate of environment states through a critic. We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.