Related papers: Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling

Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling

URL: http://arxiv.org/abs/2502.14379v1
Date: Thu, 20 Feb 2025 09:12:16 GMT
Title: Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling
Authors: Hao Qin, Kwang-Sung Jun, Chicheng Zhang,
Abstract summary: We study the problem of Multi-Armed Bandits (MAB) with reward distributions belonging to a One- Exponential Distribution (OPED) family.<n>In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. expklms), that can achieve multiple optimality criteria simultaneously.
Score: 24.487235945761913
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of Multi-Armed Bandits (MAB) with reward distributions belonging to a One-Parameter Exponential Distribution (OPED) family. In the literature, several criteria have been proposed to evaluate the performance of such algorithms, including Asymptotic Optimality (A.O.), Minimax Optimality (M.O.), Sub-UCB, and variance-adaptive worst-case regret bound. Thompson Sampling (TS)-based and Upper Confidence Bound (UCB)-based algorithms have been employed to achieve some of these criteria. However, none of these algorithms simultaneously satisfy all the aforementioned criteria. In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. \expklms), that can achieve multiple optimality criteria simultaneously, including A.O., M.O. with a logarithmic factor, Sub-UCB, and variance-adaptive worst-case regret bound.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Lasso Bandit with Compatibility Condition on Optimal Arm [10.216425987201333]
We consider a sparse linear bandit problem where only a sparse subset of context features affects the expected reward function.<n>We propose an algorithm that adapts the forced-sampling technique and prove that the proposed algorithm achieves $O(textpolylog dT)$ regret.
arXiv Detail & Related papers (2024-06-02T18:11:47Z)
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem. It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine Constraints [21.38692458445459]
We study the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints. A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case.
arXiv Detail & Related papers (2023-10-02T08:07:36Z)
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms [14.33758865948252]
We revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS) We prove that MED is optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known.
arXiv Detail & Related papers (2023-03-10T16:43:48Z)
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family. We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z)
From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits [0.0]
We study a generic Dirichlet Sampling (DS) algorithm, based on pairwise comparisons of empirical indices computed with re-sampling of the arms' observations. We show that different variants of this strategy achieve provably optimal regret guarantees when the distributions are bounded and logarithmic regret for semi-bounded distributions with a mild quantile condition.
arXiv Detail & Related papers (2021-11-18T14:34:21Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Thompson Sampling for Unimodal Bandits [21.514495320038712]
We propose a Thompson Sampling algorithm for emphunimodal bandits, where the expected reward is unimodal over the partially ordered arms. For Gaussian rewards, the regret of our algorithm is $mathcalO(log T)$, which is far better than standard Thompson Sampling algorithms.
arXiv Detail & Related papers (2021-06-15T14:40:34Z)
Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB. We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits. Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.