Price of Safety in Linear Best Arm Identification
- URL: http://arxiv.org/abs/2309.08709v1
- Date: Fri, 15 Sep 2023 19:01:21 GMT
- Title: Price of Safety in Linear Best Arm Identification
- Authors: Xuedong Shang and Igor Colin and Merwan Barlier and Hamza Cherkaoui
- Abstract summary: We introduce the safe best-arm identification framework with linear feedback.
The agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector.
We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety.
- Score: 6.82469220191368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the safe best-arm identification framework with linear feedback,
where the agent is subject to some stage-wise safety constraint that linearly
depends on an unknown parameter vector. The agent must take actions in a
conservative way so as to ensure that the safety constraint is not violated
with high probability at each round. Ways of leveraging the linear structure
for ensuring safety has been studied for regret minimization, but not for
best-arm identification to the best our knowledge. We propose a gap-based
algorithm that achieves meaningful sample complexity while ensuring the
stage-wise safety. We show that we pay an extra term in the sample complexity
due to the forced exploration phase incurred by the additional safety
constraint. Experimental illustrations are provided to justify the design of
our algorithm.
Related papers
- Revisiting Safe Exploration in Safe Reinforcement learning [0.098314893665023]
We introduce a new metric, expected maximum consecutive cost steps (EMCC), which addresses safety during training.
EMCC is particularly effective for distinguishing between prolonged and occasional safety violations.
We propose a new lightweight benchmark task, which allows fast evaluation for algorithm design.
arXiv Detail & Related papers (2024-09-02T13:29:29Z) - Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering [56.92068213969036]
Safety alignment is indispensable for Large language models (LLMs) to defend threats from malicious instructions.
Recent researches reveal safety-aligned LLMs prone to reject benign queries due to the exaggerated safety issue.
We propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns.
arXiv Detail & Related papers (2024-08-21T10:01:34Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Distributionally Safe Reinforcement Learning under Model Uncertainty: A
Single-Level Approach by Differentiable Convex Programming [4.825619788907192]
We present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric.
To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space to a finite-dimensional parametric space.
By differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules.
arXiv Detail & Related papers (2023-10-03T22:05:05Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Best Arm Identification with Safety Constraints [3.7783523378336112]
The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems.
We study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many.
We propose an algorithm in this setting which is guaranteed to learn safely.
arXiv Detail & Related papers (2021-11-23T20:53:12Z) - Learning to Act Safely with Limited Exposure and Almost Sure Certainty [1.0323063834827415]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for exploratory trials.
We first focus on the canonical multi-armed bandit problem and seek to study the intrinsic trade-offs of learning safety in the presence of uncertainty.
arXiv Detail & Related papers (2021-05-18T18:05:12Z) - Context-Aware Safe Reinforcement Learning for Non-Stationary
Environments [24.75527261989899]
Safety is a critical concern when deploying reinforcement learning agents for realistic tasks.
We propose the context-aware safe reinforcement learning (CASRL) method to realize safe adaptation in non-stationary environments.
Results show that the proposed algorithm significantly outperforms existing baselines in terms of safety and robustness.
arXiv Detail & Related papers (2021-01-02T23:52:22Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.