Directional Optimism for Safe Linear Bandits
- URL: http://arxiv.org/abs/2308.15006v2
- Date: Mon, 11 Mar 2024 23:32:33 GMT
- Title: Directional Optimism for Safe Linear Bandits
- Authors: Spencer Hutchinson, Berkay Turan, Mahnoosh Alizadeh
- Abstract summary: The safe linear bandit problem is a version of the classical linear bandit problem where the learner's actions must satisfy an uncertain constraint at all rounds.
We find that it is possible to achieve improved regret guarantees for both well-separated problem instances and action sets that are finite star convex sets.
Lastly, we introduce a generalization of the safe linear bandit setting where the constraints are convex and adapt our algorithms and analyses to this setting by leveraging a novel convex-analysis based approach.
- Score: 4.84955052297084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The safe linear bandit problem is a version of the classical stochastic
linear bandit problem where the learner's actions must satisfy an uncertain
constraint at all rounds. Due its applicability to many real-world settings,
this problem has received considerable attention in recent years. By leveraging
a novel approach that we call directional optimism, we find that it is possible
to achieve improved regret guarantees for both well-separated problem instances
and action sets that are finite star convex sets. Furthermore, we propose a
novel algorithm for this setting that improves on existing algorithms in terms
of empirical performance, while enjoying matching regret guarantees. Lastly, we
introduce a generalization of the safe linear bandit setting where the
constraints are convex and adapt our algorithms and analyses to this setting by
leveraging a novel convex-analysis based approach.
Related papers
- Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Convex Methods for Constrained Linear Bandits [2.5782420501870296]
This work presents a comprehensive study on the computational aspects of safe bandit algorithms, specifically safe linear bandits.
We first characterize the properties of the optimal policy for safe linear bandit problem and then propose an end-to-end pipeline of safe linear bandit algorithms.
arXiv Detail & Related papers (2023-11-07T20:45:46Z) - The Impact of the Geometric Properties of the Constraint Set in Safe
Optimization with Bandit Feedback [5.758073912084366]
We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment.
We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm.
arXiv Detail & Related papers (2023-05-01T15:48:34Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - On the Optimality of Batch Policy Optimization Algorithms [106.89498352537682]
Batch policy optimization considers leveraging existing data for policy construction before interacting with an environment.
We show that any confidence-adjusted index algorithm is minimax optimal, whether it be optimistic, pessimistic or neutral.
We introduce a new weighted-minimax criterion that considers the inherent difficulty of optimal value prediction.
arXiv Detail & Related papers (2021-04-06T05:23:20Z) - Experimental Design for Regret Minimization in Linear Bandits [19.8309784360219]
We propose a novel design-based algorithm to minimize regret in online linear and bandits.
We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
arXiv Detail & Related papers (2020-11-01T17:59:19Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.