An Index-based Deterministic Asymptotically Optimal Algorithm for
Constrained Multi-armed Bandit Problems
- URL: http://arxiv.org/abs/2007.14550v1
- Date: Wed, 29 Jul 2020 01:54:22 GMT
- Title: An Index-based Deterministic Asymptotically Optimal Algorithm for
Constrained Multi-armed Bandit Problems
- Authors: Hyeong Soo Chang
- Abstract summary: For the model of constrained multi-armed bandit, we show that there exists an index-based deterministically optimal algorithm.
We provide a finite-time bound to the probability of the optimality given as 1-O(|A|Te-T) where T is the horizon size and A is the set of the arms in the bandit.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For the model of constrained multi-armed bandit, we show that by construction
there exists an index-based deterministic asymptotically optimal algorithm. The
optimality is achieved by the convergence of the probability of choosing an
optimal feasible arm to one over infinite horizon. The algorithm is built upon
Locatelli et al.'s "anytime parameter-free thresholding" algorithm under the
assumption that the optimal value is known. We provide a finite-time bound to
the probability of the asymptotic optimality given as 1-O(|A|Te^{-T}) where T
is the horizon size and A is the set of the arms in the bandit. We then study a
relaxed-version of the algorithm in a general form that estimates the optimal
value and discuss the asymptotic optimality of the algorithm after a
sufficiently large T with examples.
Related papers
- Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and
Optimal Algorithms [64.10576998630981]
We show the first tight characterization of the optimal Hessian-dependent sample complexity.
A Hessian-independent algorithm universally achieves the optimal sample complexities for all Hessian instances.
The optimal sample complexities achieved by our algorithm remain valid for heavy-tailed noise distributions.
arXiv Detail & Related papers (2023-06-21T17:03:22Z) - n-Step Temporal Difference Learning with Optimal n [5.945710235932345]
We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning.
Our objective function for the optimization problem is the average root mean squared error (RMSE)
arXiv Detail & Related papers (2023-03-13T12:44:32Z) - Convergence Rate Analysis for Optimal Computing Budget Allocation
Algorithms [1.713291434132985]
Ordinal optimization (OO) is a widely-studied technique for optimizing discrete-event dynamic systems.
A well-known method in OO is the optimal computing budget allocation (OCBA)
In this paper, we investigate two popular OCBA algorithms.
arXiv Detail & Related papers (2022-11-27T04:55:40Z) - Selection of the Most Probable Best [2.1095005405219815]
We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution.
We define the most probable best (MPB) to be the solution that has the largest probability of being optimal with respect to the distribution.
We devise a series of algorithms that replace the unknown means in the optimality conditions with their estimates and prove the algorithms' sampling ratios achieve the conditions as the simulation budget increases.
arXiv Detail & Related papers (2022-07-15T15:27:27Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process.
Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace.
We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z) - Gamification of Pure Exploration for Linear Bandits [34.16123941778227]
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear bandits.
Whileally optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive.
We design the first insightally optimal algorithm for fixed-confidence pure exploration in linear bandits.
arXiv Detail & Related papers (2020-07-02T08:20:35Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.