Best Arm Identification with Safety Constraints
- URL: http://arxiv.org/abs/2111.12151v1
- Date: Tue, 23 Nov 2021 20:53:12 GMT
- Title: Best Arm Identification with Safety Constraints
- Authors: Zhenlin Wang, Andrew Wagenmaker, Kevin Jamieson
- Abstract summary: The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems.
We study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many.
We propose an algorithm in this setting which is guaranteed to learn safely.
- Score: 3.7783523378336112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The best arm identification problem in the multi-armed bandit setting is an
excellent model of many real-world decision-making problems, yet it fails to
capture the fact that in the real-world, safety constraints often must be met
while learning. In this work we study the question of best-arm identification
in safety-critical settings, where the goal of the agent is to find the best
safe option out of many, while exploring in a way that guarantees certain,
initially unknown safety constraints are met. We first analyze this problem in
the setting where the reward and safety constraint takes a linear structure,
and show nearly matching upper and lower bounds. We then analyze a much more
general version of the problem where we only assume the reward and safety
constraint can be modeled by monotonic functions, and propose an algorithm in
this setting which is guaranteed to learn safely. We conclude with experimental
results demonstrating the effectiveness of our approaches in scenarios such as
safely identifying the best drug out of many in order to treat an illness.
Related papers
- Cross-Modality Safety Alignment [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment.
To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations.
Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z) - Distributionally Safe Reinforcement Learning under Model Uncertainty: A
Single-Level Approach by Differentiable Convex Programming [4.825619788907192]
We present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric.
To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space to a finite-dimensional parametric space.
By differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules.
arXiv Detail & Related papers (2023-10-03T22:05:05Z) - Price of Safety in Linear Best Arm Identification [6.82469220191368]
We introduce the safe best-arm identification framework with linear feedback.
The agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector.
We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety.
arXiv Detail & Related papers (2023-09-15T19:01:21Z) - Risk-Averse Model Uncertainty for Distributionally Robust Safe
Reinforcement Learning [3.9821399546174825]
We introduce a deep reinforcement learning framework for safe decision making in uncertain environments.
We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems.
In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
arXiv Detail & Related papers (2023-01-30T00:37:06Z) - Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity.
As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner.
On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Active Learning with Safety Constraints [25.258564629480063]
We investigate the complexity of learning the best safe decision in interactive environments.
We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal.
arXiv Detail & Related papers (2022-06-22T15:45:38Z) - Towards Safe Policy Improvement for Non-Stationary MDPs [48.9966576179679]
Many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable.
We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems.
Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis.
arXiv Detail & Related papers (2020-10-23T20:13:51Z) - Verifiably Safe Exploration for End-to-End Reinforcement Learning [17.401496872603943]
This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs.
It is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints.
arXiv Detail & Related papers (2020-07-02T16:12:20Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.