Bayesian Regret Minimization in Offline Bandits
- URL: http://arxiv.org/abs/2306.01237v3
- Date: Tue, 2 Jul 2024 21:10:03 GMT
- Title: Bayesian Regret Minimization in Offline Bandits
- Authors: Marek Petrik, Guy Tennenholtz, Mohammad Ghavamzadeh,
- Abstract summary: We study how to make decisions that minimize Bayesian regret in offline linear bandits.
We argue that the reliance on LCB is inherently flawed in this setting.
Our bounds build heavily on new connections to monetary risk measures.
- Score: 35.7981453841683
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that the reliance on LCB is inherently flawed in this setting and propose a new algorithm that directly minimizes upper bounds on the Bayesian regret using efficient conic optimization solvers. Our bounds build heavily on new connections to monetary risk measures. Proving a matching lower bound, we show that our upper bounds are tight, and by minimizing them we are guaranteed to outperform the LCB approach. Our numerical results on synthetic domains confirm that our approach is superior to LCB.
Related papers
- Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits [21.09844002135398]
We show that Linear Thompson sampling (LinTS) and the extension of Bayesian Upper Confidence Bound (LinBUCB) can preserve their original rates of regret upper bound.
We also show that LinBUCB expedites the regret rate of LinTS from $tildeO(d3/2sqrtT)$ to $tildeO(dsqrtT)$ matching the minimax optimal rate.
arXiv Detail & Related papers (2024-06-20T07:45:38Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Rate-optimal Bayesian Simple Regret in Best Arm Identification [11.389780431092914]
We consider best arm identification in the multi-armed bandit problem.
We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor.
arXiv Detail & Related papers (2021-11-18T18:59:35Z) - A Unified Framework for Conservative Exploration [115.7063101600773]
We study bandits and reinforcement learning (RL) subject to a conservative constraint where the agent is asked to perform at least as well as a given baseline policy.
We present a unified framework for conservative bandits and RL, in which our core technique is to calculate the necessary and sufficient budget obtained from running the baseline policy.
arXiv Detail & Related papers (2021-06-22T11:52:04Z) - Towards Minimax Optimal Best Arm Identification in Linear Bandits [95.22854522340938]
We study the problem of best arm identification in linear bandits in the fixed-budget setting.
By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm.
We provide a theoretical analysis of the failure probability of OD-LinBAI.
arXiv Detail & Related papers (2021-05-27T09:19:10Z) - Optimal Stochastic Nonconvex Optimization with Bandit Feedback [45.675080529219365]
We analyze continuous armed bandit problems for non cost functions under certain smoothness and sublevel set assumptions.
We then propose an adaptive splitting method, which can significantly improve the performance.
arXiv Detail & Related papers (2021-03-30T05:21:12Z) - Stage-wise Conservative Linear Bandits [37.717532659194426]
We study bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials.
We present two novel algorithms that respect the baseline constraints and enjoy probabilistic regret bounds of order O(sqrtT log T)
Notably, the proposed algorithms can be adjusted with only minor modifications to tackle different problem variations.
arXiv Detail & Related papers (2020-09-30T19:51:37Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Constrained regret minimization for multi-criterion multi-armed bandits [5.349852254138086]
We study the problem of regret minimization over a given time horizon, subject to a risk constraint.
We propose a Risk-Constrained Lower Confidence Bound algorithm that guarantees logarithmic regret.
We prove lower bounds on the performance of any risk-constrained regret minimization algorithm.
arXiv Detail & Related papers (2020-06-17T04:23:18Z) - Locally Differentially Private (Contextual) Bandits Learning [55.63825598391525]
We study locally differentially private (LDP) bandits learning in this paper.
We propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee.
arXiv Detail & Related papers (2020-06-01T04:02:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.