An Index-based Deterministic Asymptotically Optimal Algorithm for
Constrained Multi-armed Bandit Problems
- URL: http://arxiv.org/abs/2007.14550v1
- Date: Wed, 29 Jul 2020 01:54:22 GMT
- Title: An Index-based Deterministic Asymptotically Optimal Algorithm for
Constrained Multi-armed Bandit Problems
- Authors: Hyeong Soo Chang
- Abstract summary: For the model of constrained multi-armed bandit, we show that there exists an index-based deterministically optimal algorithm.
We provide a finite-time bound to the probability of the optimality given as 1-O(|A|Te-T) where T is the horizon size and A is the set of the arms in the bandit.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For the model of constrained multi-armed bandit, we show that by construction
there exists an index-based deterministic asymptotically optimal algorithm. The
optimality is achieved by the convergence of the probability of choosing an
optimal feasible arm to one over infinite horizon. The algorithm is built upon
Locatelli et al.'s "anytime parameter-free thresholding" algorithm under the
assumption that the optimal value is known. We provide a finite-time bound to
the probability of the asymptotic optimality given as 1-O(|A|Te^{-T}) where T
is the horizon size and A is the set of the arms in the bandit. We then study a
relaxed-version of the algorithm in a general form that estimates the optimal
value and discuss the asymptotic optimality of the algorithm after a
sufficiently large T with examples.
Related papers
- Approximation Algorithms for Combinatorial Optimization with Predictions [3.7235228254732524]
We use predictions to improve over approximation guarantees of classic algorithms.
Our algorithms produce optimal solutions when provided with perfect predictions.
Although we show our approach to be optimal for this class of problems as a whole, there is a potential for exploiting specific structural properties of individual problems to obtain improved bounds.
arXiv Detail & Related papers (2024-11-25T17:31:34Z) - Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and
Optimal Algorithms [64.10576998630981]
We show the first tight characterization of the optimal Hessian-dependent sample complexity.
A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances.
The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
arXiv Detail & Related papers (2023-06-21T17:03:22Z) - Convergence Rate Analysis for Optimal Computing Budget Allocation
Algorithms [1.713291434132985]
Ordinal optimization (OO) is a widely-studied technique for optimizing discrete-event dynamic systems.
A well-known method in OO is the optimal computing budget allocation (OCBA)
In this paper, we investigate two popular OCBA algorithms.
arXiv Detail & Related papers (2022-11-27T04:55:40Z) - Selection of the Most Probable Best [2.1095005405219815]
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution.
We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution.
We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases.
arXiv Detail & Related papers (2022-07-15T15:27:27Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.