Benefits of Monotonicity in Safe Exploration with Gaussian Processes
- URL: http://arxiv.org/abs/2211.01561v2
- Date: Mon, 19 Jun 2023 12:19:33 GMT
- Title: Benefits of Monotonicity in Safe Exploration with Gaussian Processes
- Authors: Arpan Losalka and Jonathan Scarlett
- Abstract summary: We consider the problem of sequentially maximising an unknown function over a set of actions.
We show that textscsffamily M-SafeUCB enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary.
- Score: 50.71125084216603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of sequentially maximising an unknown function over a
set of actions while ensuring that every sampled point has a function value
below a given safety threshold. We model the function using kernel-based and
Gaussian process methods, while differing from previous works in our assumption
that the function is monotonically increasing with respect to a \emph{safety
variable}. This assumption is motivated by various practical applications such
as adaptive clinical trial design and robotics. Taking inspiration from the
\textsc{\sffamily GP-UCB} and \textsc{\sffamily SafeOpt} algorithms, we propose
an algorithm, monotone safe {\sffamily UCB} (\textsc{\sffamily M-SafeUCB}) for
this task. We show that \textsc{\sffamily M-SafeUCB} enjoys theoretical
guarantees in terms of safety, a suitably-defined regret notion, and
approximately finding the entire safe boundary. In addition, we illustrate that
the monotonicity assumption yields significant benefits in terms of the
guarantees obtained, as well as algorithmic simplicity and efficiency. We
support our theoretical findings by performing empirical evaluations on a
variety of functions, including a simulated clinical trial experiment.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Uncertainty Quantification with Bayesian Higher Order ReLU KANs [0.0]
We introduce the first method of uncertainty quantification in the domain of Kolmogorov-Arnold Networks, specifically focusing on (Higher Order) ReLUKANs.
We validate our method through a series of closure tests, including simple one-dimensional functions.
We demonstrate the method's ability to correctly identify functional dependencies introduced through the inclusion of a term.
arXiv Detail & Related papers (2024-10-02T15:57:18Z) - No-Regret Algorithms for Safe Bayesian Optimization with Monotonicity Constraints [41.04951588017592]
We consider the problem of sequentially maximizing an unknown function $f$ over a set of actions of the form $(s,mathbfx)$.
We show that a modified version of our algorithm is able to attain sublinear regret for the task of finding a near-optimal $s$ corresponding to every $mathbfx$.
arXiv Detail & Related papers (2024-06-05T13:41:26Z) - On Safety in Safe Bayesian Optimization [5.9045432488022485]
We investigate three safety-related issues of the popular class of SafeOpt-type algorithms.
First, these algorithms critically rely on frequentist bounds uncertainty for Gaussian Process (GP) regression.
Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function.
Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems.
arXiv Detail & Related papers (2024-03-19T17:50:32Z) - Verification-Aided Learning of Neural Network Barrier Functions with
Termination Guarantees [6.9060054915724]
Barrier functions are a general framework for establishing a safety guarantee for a system.
There is no general method for finding these functions.
Recent approaches use self-supervised learning techniques to learn these functions.
arXiv Detail & Related papers (2024-03-12T04:29:43Z) - Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation.
We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.