Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme
- URL: http://arxiv.org/abs/2006.06792v3
- Date: Sun, 23 May 2021 23:08:14 GMT
- Title: Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme
- Authors: Kontantinos E. Nikolakakis, Dionysios S. Kalogerias, Or Sheffet and
Anand D. Sarwate
- Abstract summary: We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
- Score: 16.1694012177079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the best-arm identification problem in multi-armed bandits with
stochastic, potentially private rewards, when the goal is to identify the arm
with the highest quantile at a fixed, prescribed level. First, we propose a
(non-private) successive elimination algorithm for strictly optimal best-arm
identification, we show that our algorithm is $\delta$-PAC and we characterize
its sample complexity. Further, we provide a lower bound on the expected number
of pulls, showing that the proposed algorithm is essentially optimal up to
logarithmic factors. Both upper and lower complexity bounds depend on a special
definition of the associated suboptimality gap, designed in particular for the
quantile bandit problem, as we show when the gap approaches zero, best-arm
identification is impossible. Second, motivated by applications where the
rewards are private, we provide a differentially private successive elimination
algorithm whose sample complexity is finite even for distributions with
infinite support-size, and we characterize its sample complexity. Our
algorithms do not require prior knowledge of either the suboptimality gap or
other statistical information related to the bandit problem at hand.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Replicability is Asymptotically Free in Multi-armed Bandits [45.729105054410745]
This work is motivated by the growing demand for reproducible machine learning.
In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
arXiv Detail & Related papers (2024-02-12T03:31:34Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - On the Existence of a Complexity in Fixed Budget Bandit Identification [0.0]
In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time.
We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms.
arXiv Detail & Related papers (2023-03-16T16:39:00Z) - Choosing Answers in $\varepsilon$-Best-Answer Identification for Linear
Bandits [0.8122270502556374]
In a pure-exploration problem, information is gathered sequentially to answer a question on the environment.
We show that picking the answer with highest mean does not allow an algorithm to reach optimality in terms of expected sample complexity.
We develop a simple procedure to adapt best-arm identification algorithms to tackle $varepsilon$-best-answer identification in transductive linear bandits.
arXiv Detail & Related papers (2022-06-09T12:27:51Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.