Related papers: Price of Safety in Linear Best Arm Identification

Price of Safety in Linear Best Arm Identification

URL: http://arxiv.org/abs/2309.08709v1
Date: Fri, 15 Sep 2023 19:01:21 GMT
Title: Price of Safety in Linear Best Arm Identification
Authors: Xuedong Shang and Igor Colin and Merwan Barlier and Hamza Cherkaoui
Abstract summary: We introduce the safe best-arm identification framework with linear feedback. The agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety.
Score: 6.82469220191368
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm.

Related papers

TraCeS: Trajectory Based Credit Assignment From Sparse Safety Feedback [15.904640266226023]
In safe reinforcement learning (RL), auxiliary safety costs are used to align the agent to safe decision making. In practice, safety constraints, including cost functions and budgets, are unknown or hard to specify. We address a general setting where the true safety definition is unknown, and has to be learned from sparsely labeled data.
arXiv Detail & Related papers (2025-04-17T01:11:08Z)
Probabilistic Shielding for Safe Reinforcement Learning [51.35559820893218]
In real-life scenarios, a Reinforcement Learning (RL) agent must often also behave in a safe manner, including at training time. We present a new, scalable method, which enjoys strict formal guarantees for Safe RL. We show that our approach provides a strict formal safety guarantee that the agent stays safe at training and test time.
arXiv Detail & Related papers (2025-03-09T17:54:33Z)
Revisiting Safe Exploration in Safe Reinforcement learning [0.098314893665023]
We introduce a new metric, expected maximum consecutive cost steps (EMCC), which addresses safety during training. EMCC is particularly effective for distinguishing between prolonged and occasional safety violations. We propose a new lightweight benchmark task, which allows fast evaluation for algorithm design.
arXiv Detail & Related papers (2024-09-02T13:29:29Z)
Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering [56.92068213969036]
Safety alignment is indispensable for Large language models (LLMs) to defend threats from malicious instructions. Recent researches reveal safety-aligned LLMs prone to reject benign queries due to the exaggerated safety issue. We propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns.
arXiv Detail & Related papers (2024-08-21T10:01:34Z)
Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL) We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z)
Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming [4.825619788907192]
We present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space to a finite-dimensional parametric space. By differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules.
arXiv Detail & Related papers (2023-10-03T22:05:05Z)
Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z)
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy. Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Best Arm Identification with Safety Constraints [3.7783523378336112]
The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems. We study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many. We propose an algorithm in this setting which is guaranteed to learn safely.
arXiv Detail & Related papers (2021-11-23T20:53:12Z)
Learning to Act Safely with Limited Exposure and Almost Sure Certainty [1.0323063834827415]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for exploratory trials. We first focus on the canonical multi-armed bandit problem and seek to study the intrinsic trade-offs of learning safety in the presence of uncertainty.
arXiv Detail & Related papers (2021-05-18T18:05:12Z)
Context-Aware Safe Reinforcement Learning for Non-Stationary Environments [24.75527261989899]
Safety is a critical concern when deploying reinforcement learning agents for realistic tasks. We propose the context-aware safe reinforcement learning (CASRL) method to realize safe adaptation in non-stationary environments. Results show that the proposed algorithm significantly outperforms existing baselines in terms of safety and robustness.
arXiv Detail & Related papers (2021-01-02T23:52:22Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.