UCB for Large-Scale Pure Exploration: Beyond Sub-Gaussianity
- URL: http://arxiv.org/abs/2511.22273v1
- Date: Thu, 27 Nov 2025 09:54:22 GMT
- Title: UCB for Large-Scale Pure Exploration: Beyond Sub-Gaussianity
- Authors: Zaile Li, Weiwei Fan, L. Jeff Hong,
- Abstract summary: This paper investigates the performance of upper confidence bound (UCB) algorithms in large-scale pure exploration problems.<n>We show that the meta-UCB algorithm and therefore a broad class of UCB algorithms can achieve the sample optimality.<n>Results demonstrate the applicability of UCB algorithms for solving large-scale pure exploration problems with non-sub-Gaussian distributions.
- Score: 0.8283940114367679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selecting the best alternative from a finite set represents a broad class of pure exploration problems. Traditional approaches to pure exploration have predominantly relied on Gaussian or sub-Gaussian assumptions on the performance distributions of all alternatives, which limit their applicability to non-sub-Gaussian especially heavy-tailed problems. The need to move beyond sub-Gaussianity may become even more critical in large-scale problems, which tend to be especially sensitive to distributional specifications. In this paper, motivated by the widespread use of upper confidence bound (UCB) algorithms in pure exploration and beyond, we investigate their performance in the large-scale, non-sub-Gaussian settings. We consider the simplest category of UCB algorithms, where the UCB value for each alternative is defined as the sample mean plus an exploration bonus that depends only on its own sample size. We abstract this into a meta-UCB algorithm and propose letting it select the alternative with the largest sample size as the best upon stopping. For this meta-UCB algorithm, we first derive a distribution-free lower bound on the probability of correct selection. Building on this bound, we analyze two general non-sub-Gaussian scenarios: (1) all alternatives follow a common location-scale structure and have bounded variance; and (2) when such a structure does not hold, each alternative has a bounded absolute moment of order $q > 3$. In both settings, we show that the meta-UCB algorithm and therefore a broad class of UCB algorithms can achieve the sample optimality. These results demonstrate the applicability of UCB algorithms for solving large-scale pure exploration problems with non-sub-Gaussian distributions. Numerical experiments support our results and provide additional insights into the comparative behaviors of UCB algorithms within and beyond our meta-UCB framework.
Related papers
- Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient? [12.182108642150103]
We show that inexact BO algorithms can still achieve sublinear cumulative regret.<n>Motivated by such findings, we provide both theoretical justification and numerical validation for random grid search.
arXiv Detail & Related papers (2025-06-13T14:35:39Z) - Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling [24.487235945761913]
We study the problem of $K$-armed bandits with reward distributions belonging to a one- parameter exponential distribution family.<n>We propose an algorithm that can achieve multiple optimality criteria simultaneously, including Asymptotic Optimality, Minimax Optimality with a $sqrtln (K)$ factor, Sub-UCB, and variance-adaptive worst-case regret bound.
arXiv Detail & Related papers (2025-02-20T09:12:16Z) - Generalized Naive Bayes [0.0]
We introduce the so-called Generalized Naive Bayes structure as an extension of the Naive Bayes structure.
We prove that this fits the data at least as well as the probability distribution determined by the classical Naive Bayes (NB)
arXiv Detail & Related papers (2024-08-28T16:36:18Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Generalized Schrödinger Bridge Matching [54.171931505066]
Generalized Schr"odinger Bridge (GSB) problem setup is prevalent in many scientific areas both within and without machine learning.
We propose Generalized Schr"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances.
We show that such a generalization can be cast as solving conditional optimal control, for which variational approximations can be used.
arXiv Detail & Related papers (2023-10-03T17:42:11Z) - Simple Modification of the Upper Confidence Bound Algorithm by
Generalized Weighted Averages [0.0]
The multi-armed bandit (MAB) problem is a classical problem that models sequential decision-making under uncertainty in reinforcement learning.
We propose a new generalized upper confidence bound (UCB) algorithm (GWA-UCB1) by extending UCB1, which is a representative algorithm for MAB problems.
GWA-UCB1 outperformed G-UCB1, UCB1-Tuned, and Thompson sampling in most problem settings and can be useful in many situations.
arXiv Detail & Related papers (2023-08-28T06:53:31Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - A Closer Look at the Worst-case Behavior of Multi-armed Bandit
Algorithms [8.099977107670918]
Upper Confidence Bound (UCB) is an optimistic-based MAB algorithm.
This paper provides new results on the arm-sampling behavior of UCB.
It also provides the first process-level characterization of the MAB problem under UCB.
arXiv Detail & Related papers (2021-06-03T20:52:26Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.