Related papers: Benefits of Monotonicity in Safe Exploration with Gaussian Processes

Benefits of Monotonicity in Safe Exploration with Gaussian Processes

URL: http://arxiv.org/abs/2211.01561v2
Date: Mon, 19 Jun 2023 12:19:33 GMT
Title: Benefits of Monotonicity in Safe Exploration with Gaussian Processes
Authors: Arpan Losalka and Jonathan Scarlett
Abstract summary: We consider the problem of sequentially maximising an unknown function over a set of actions. We show that textscsffamily M-SafeUCB enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary.
Score: 50.71125084216603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of sequentially maximising an unknown function over a set of actions while ensuring that every sampled point has a function value below a given safety threshold. We model the function using kernel-based and Gaussian process methods, while differing from previous works in our assumption that the function is monotonically increasing with respect to a \emph{safety variable}. This assumption is motivated by various practical applications such as adaptive clinical trial design and robotics. Taking inspiration from the \textsc{\sffamily GP-UCB} and \textsc{\sffamily SafeOpt} algorithms, we propose an algorithm, monotone safe {\sffamily UCB} (\textsc{\sffamily M-SafeUCB}) for this task. We show that \textsc{\sffamily M-SafeUCB} enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary. In addition, we illustrate that the monotonicity assumption yields significant benefits in terms of the guarantees obtained, as well as algorithmic simplicity and efficiency. We support our theoretical findings by performing empirical evaluations on a variety of functions, including a simulated clinical trial experiment.

Related papers

Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits [58.63897489864948]
Reinforcement learning with outcome-based feedback faces a fundamental challenge.<n>How do we assign credit to the right actions?<n>This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
arXiv Detail & Related papers (2025-05-26T17:44:08Z)
Finite-Sample-Based Reachability for Safe Control with Gaussian Process Dynamics [35.79393879150088]
We present a sampling-based framework that efficiently propagates the model's uncertainty while avoiding conservatism.<n>We show that our method highlights accurate reachable set over-approximation and safe closed-loop performance.
arXiv Detail & Related papers (2025-05-12T14:20:20Z)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose. In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z)
Uncertainty Quantification with Bayesian Higher Order ReLU KANs [0.0]
We introduce the first method of uncertainty quantification in the domain of Kolmogorov-Arnold Networks, specifically focusing on (Higher Order) ReLUKANs. We validate our method through a series of closure tests, including simple one-dimensional functions. We demonstrate the method's ability to correctly identify functional dependencies introduced through the inclusion of a term.
arXiv Detail & Related papers (2024-10-02T15:57:18Z)
No-Regret Algorithms for Safe Bayesian Optimization with Monotonicity Constraints [41.04951588017592]
We consider the problem of sequentially maximizing an unknown function $f$ over a set of actions of the form $(s,mathbfx)$. We show that a modified version of our algorithm is able to attain sublinear regret for the task of finding a near-optimal $s$ corresponding to every $mathbfx$.
arXiv Detail & Related papers (2024-06-05T13:41:26Z)
On Safety in Safe Bayesian Optimization [5.9045432488022485]
We investigate three safety-related issues of the popular class of SafeOpt-type algorithms. First, these algorithms critically rely on frequentist bounds uncertainty for Gaussian Process (GP) regression. Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function. Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems.
arXiv Detail & Related papers (2024-03-19T17:50:32Z)
Verification-Aided Learning of Neural Network Barrier Functions with Termination Guarantees [6.9060054915724]
Barrier functions are a general framework for establishing a safety guarantee for a system. There is no general method for finding these functions. Recent approaches use self-supervised learning techniques to learn these functions.
arXiv Detail & Related papers (2024-03-12T04:29:43Z)
Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation. We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.