Related papers: Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization

Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization

URL: http://arxiv.org/abs/2503.10836v1
Date: Thu, 13 Mar 2025 19:35:54 GMT
Title: Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization
Authors: Kevin Li, Eric Laber,
Abstract summary: The contextual bandit framework is widely used to solve sequential optimization problems.<n>We consider settings in which the mean reward is known to be a concave function of the action for each fixed context.<n>We propose a contextual bandit algorithm that accelerates optimization by conditioning the posterior of a Bayesian Gaussian Process model on this concavity information.
Score: 2.1046873879077794
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The contextual bandit framework is widely used to solve sequential optimization problems where the reward of each decision depends on auxiliary context variables. In settings such as medicine, business, and engineering, the decision maker often possesses additional structural information on the generative model that can potentially be used to improve the efficiency of bandit algorithms. We consider settings in which the mean reward is known to be a concave function of the action for each fixed context. Examples include patient-specific dose-response curves in medicine and expected profit in online advertising auctions. We propose a contextual bandit algorithm that accelerates optimization by conditioning the posterior of a Bayesian Gaussian Process model on this concavity information. We design a novel shape-constrained reward function estimator using a specially chosen regression spline basis and constrained Gaussian Process posterior. Using this model, we propose a UCB algorithm and derive corresponding regret bounds. We evaluate our algorithm on numerical examples and test functions used to study optimal dosing of Anti-Clotting medication.

Related papers

Indirect Query Bayesian Optimization with Integrated Feedback [17.66813850517961]
We develop a new class of Bayesian optimization problems where integrated feedback is given via a conditional expectation of the unknown function $f$ to be optimized. The goal is to find the global optimum of $f$ by adaptively querying and observing in the space transformed by the conditional distribution. This is motivated by real-world applications where one cannot access direct feedback due to privacy, hardware or computational constraints.
arXiv Detail & Related papers (2024-12-18T07:20:33Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency [18.17090625880964]
We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates.
arXiv Detail & Related papers (2023-10-23T20:30:44Z)
Transfer Learning with Partially Observable Offline Data via Causal Bounds [8.981637739384674]
In this paper, we investigate transfer learning in partially observable contextual bandits.<n>Agents operate with incomplete information and limited access to hidden confounders.<n>We propose an efficient method that discretizes the functional constraints of unknown distributions into linear constraints.<n>This method takes into account estimation errors and exhibits strong convergence properties, ensuring robust and reliable causal bounds.
arXiv Detail & Related papers (2023-08-07T13:24:50Z)
Rollout Algorithms and Approximate Dynamic Programming for Bayesian Optimization and Sequential Estimation [0.0]
We provide a unifying approximate dynamic programming framework that applies to a broad variety of problems involving sequential estimation. We consider first the construction of surrogate cost functions for the purposes of optimization, and we focus on the special case of Bayesian optimization. We then discuss the more general case of sequential estimation of a random vector using optimal measurement selection, and its application to problems of adaptive control.
arXiv Detail & Related papers (2022-12-15T17:50:23Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits. We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z)
Bayesian Optimisation for Constrained Problems [0.0]
We propose a novel variant of the well-known Knowledge Gradient acquisition function that allows it to handle constraints. We empirically compare the new algorithm with four other state-of-the-art constrained Bayesian optimisation algorithms and demonstrate its superior performance.
arXiv Detail & Related papers (2021-05-27T15:43:09Z)
An Efficient Algorithm for Deep Stochastic Contextual Bandits [10.298368632706817]
In contextual bandit problems, an agent selects an action based on certain observed context to maximize the reward over iterations. Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and is trained by a gradient based method.
arXiv Detail & Related papers (2021-04-12T16:34:43Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.