Related papers: Experimental Design for Regret Minimization in Linear Bandits

Experimental Design for Regret Minimization in Linear Bandits

URL: http://arxiv.org/abs/2011.00576v2
Date: Sat, 27 Feb 2021 01:07:39 GMT
Title: Experimental Design for Regret Minimization in Linear Bandits
Authors: Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson
Abstract summary: We propose a novel design-based algorithm to minimize regret in online linear and bandits. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime.
Score: 19.8309784360219
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to be suboptimal in many cases--our approach carefully plans which action to take by balancing the tradeoff between information gain and reward, overcoming the failures of optimism. In addition, we leverage tools from the theory of suprema of empirical processes to obtain regret guarantees that scale with the Gaussian width of the action set, avoiding wasteful union bounds. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime. In the combinatorial semi-bandit setting, we show that our algorithm is computationally efficient and relies only on calls to a linear maximization oracle. In addition, we show that with slight modification our algorithm can be used for pure exploration, obtaining state-of-the-art pure exploration guarantees in the semi-bandit setting. Finally, we provide, to the best of our knowledge, the first example where optimism fails in the semi-bandit regime, and show that in this setting our algorithm succeeds.

Related papers

Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Near-Optimal Algorithm for Non-Stationary Kernelized Bandits [6.379833644595456]
We study a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization. We show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat'ern kernels. We propose a novel near-optimal algorithm called restarting phased elimination with random permutation.
arXiv Detail & Related papers (2024-10-21T14:28:26Z)
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z)
Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms. This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z)
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces [63.22864716473051]
We propose a novel BO algorithm which expands (and shifts) the search space over iterations. We show theoretically that for both our algorithms, the cumulative regret grows at sub-linear rates.
arXiv Detail & Related papers (2020-09-05T14:24:40Z)
An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings. We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms. We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z)
Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures. Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)
Bandit algorithms to emulate human decision making using probabilistic distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions. We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.