On Regret with Multiple Best Arms
- URL: http://arxiv.org/abs/2006.14785v2
- Date: Thu, 22 Oct 2020 14:55:32 GMT
- Title: On Regret with Multiple Best Arms
- Authors: Yinglun Zhu and Robert Nowak
- Abstract summary: We study a regret problem with the existence of multiple best/near-optimal arms in a bandit setting.
Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem.
- Score: 12.315392649501101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a regret minimization problem with the existence of multiple
best/near-optimal arms in the multi-armed bandit setting. We consider the case
when the number of arms/actions is comparable or much larger than the time
horizon, and make no assumptions about the structure of the bandit instance.
Our goal is to design algorithms that can automatically adapt to the unknown
hardness of the problem, i.e., the number of best arms. Our setting captures
many modern applications of bandit algorithms where the action space is
enormous and the information about the underlying instance/structure is
unavailable. We first propose an adaptive algorithm that is agnostic to the
hardness level and theoretically derive its regret bound. We then prove a lower
bound for our problem setting, which indicates: (1) no algorithm can be minimax
optimal simultaneously over all hardness levels; and (2) our algorithm achieves
a rate function that is Pareto optimal. With additional knowledge of the
expected reward of the best arm, we propose another adaptive algorithm that is
minimax optimal, up to polylog factors, over all hardness levels. Experimental
results confirm our theoretical guarantees and show advantages of our
algorithms over the previous state-of-the-art.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Fixed-Budget Real-Valued Combinatorial Pure Exploration of Multi-Armed
Bandit [65.268245109828]
We first introduce the Combinatorial Successive Asign algorithm, which is the first algorithm that can identify the best action even when the size of the action class is exponentially large.
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
We experimentally compare the algorithms with previous methods and show that our algorithm performs better.
arXiv Detail & Related papers (2023-10-24T09:47:32Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.