An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit
- URL: http://arxiv.org/abs/2306.09202v2
- Date: Thu, 14 Dec 2023 11:01:33 GMT
- Title: An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit
- Authors: Shintaro Nakamura and Masashi Sugiyama
- Abstract summary: We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
- Score: 65.268245109828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the real-valued combinatorial pure exploration problem in the
stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of
the action set is polynomial with respect to the number of arms. In such a
case, the R-CPE-MAB can be seen as a special case of the so-called transductive
linear bandits. Existing methods in the R-CPE-MAB and transductive linear
bandits have a gap of problem-dependent constant terms and logarithmic terms
between the upper and lower bounds of the sample complexity, respectively. We
close these gaps by proposing an algorithm named the combinatorial gap-based
exploration (CombGapE) algorithm, whose sample complexity upper bound matches
the lower bound. Finally, we numerically show that the CombGapE algorithm
outperforms existing methods significantly.
Related papers
- Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem.
It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling.
We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z) - Query-Efficient Correlation Clustering with Noisy Oracle [17.11782578276788]
We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB)
We design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation and study their theoretical guarantees.
Our results are the first examples of clustering-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
arXiv Detail & Related papers (2024-02-02T13:31:24Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave.
Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality.
We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z) - Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts.
We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z) - Combinatorial Pure Exploration with Bottleneck Reward Function and its
Extension to General Reward Functions [13.982295536546728]
We study the Combinatorial Pure Exploration problem with the bottleneck reward function (CPE-B) under the fixed-confidence and fixed-budget settings.
We present both fixed-confidence and fixed-budget algorithms, and provide the sample complexity lower bound for the fixed-confidence setting.
In addition, we extend CPE-B to general reward functions (CPE-G) and propose the first fixed-confidence algorithm for general non-linear reward functions with non-trivial sample complexity.
arXiv Detail & Related papers (2021-02-24T06:47:51Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.