From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits
- URL: http://arxiv.org/abs/2111.09724v1
- Date: Thu, 18 Nov 2021 14:34:21 GMT
- Title: From Optimality to Robustness: Dirichlet Sampling Strategies in
Stochastic Bandits
- Authors: Dorian Baudry (CRIStAL, Scool, CNRS), Patrick Saux (CRIStAL, Scool),
Odalric-Ambrym Maillard (CRIStAL, Scool)
- Abstract summary: We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations.
We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stochastic multi-arm bandit problem has been extensively studied under
standard assumptions on the arm's distribution (e.g bounded with known support,
exponential family, etc). These assumptions are suitable for many real-world
problems but sometimes they require knowledge (on tails for instance) that may
not be precisely accessible to the practitioner, raising the question of the
robustness of bandit algorithms to model misspecification. In this paper we
study a generic Dirichlet Sampling (DS) algorithm, based on pairwise
comparisons of empirical indices computed with re-sampling of the arms'
observations and a data-dependent exploration bonus. We show that different
variants of this strategy achieve provably optimal regret guarantees when the
distributions are bounded and logarithmic regret for semi-bounded distributions
with a mild quantile condition. We also show that a simple tuning achieve
robustness with respect to a large class of unbounded distributions, at the
cost of slightly worse than logarithmic asymptotic regret. We finally provide
numerical experiments showing the merits of DS in a decision-making problem on
synthetic agriculture data.
Related papers
- Distribution Estimation under the Infinity Norm [19.997465098927858]
We present novel bounds for estimating discrete probability distributions under the $ell_infty$ norm.
Our data-dependent convergence guarantees for the maximum likelihood estimator significantly improve upon the currently known results.
arXiv Detail & Related papers (2024-02-13T12:49:50Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Distributionally Robust Skeleton Learning of Discrete Bayesian Networks [9.46389554092506]
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data.
We propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution.
We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach.
arXiv Detail & Related papers (2023-11-10T15:33:19Z) - A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms [14.33758865948252]
We revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS)
We prove that MED is optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known.
arXiv Detail & Related papers (2023-03-10T16:43:48Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Optimal Fixed-Budget Best Arm Identification using the Augmented Inverse
Probability Estimator in Two-Armed Gaussian Bandits with Unknown Variances [27.122181278234617]
We consider the fixed-budget best arm identification problem in two-armed Gaussian bandits with unknown variances.
We propose a strategy comprising a sampling rule with randomized sampling (RS) following the estimated target allocation probabilities of arm draws.
We show that the proposed strategy is agnostically optimal when the sample size becomes infinitely large and the gap between the two arms goes to zero.
arXiv Detail & Related papers (2022-01-12T13:38:33Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.