Related papers: Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem

Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem

URL: http://arxiv.org/abs/2512.18409v1
Date: Sat, 20 Dec 2025 16:11:55 GMT
Title: Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem
Authors: Vikram Krishnamurthy,
Abstract summary: Several optimism-based bandit algorithms -- including UCB, UCB-V, linear UCB, and finite-arm GP-UCB -- achieve logarithmic regret using proofs that, despite superficial differences, follow essentially the same structure.<n>This note isolates the minimal ingredients behind these analyses: a single high-probability concentration condition on the estimators, after which logarithmic regret follows from two short deterministic lemmas describing radius collapse and optimism-forced deviations.<n>The framework yields unified, near-minimal proofs for these classical algorithms and extends naturally to many contemporary bandit variants.
Score: 15.493230983626281
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several optimism-based stochastic bandit algorithms -- including UCB, UCB-V, linear UCB, and finite-arm GP-UCB -- achieve logarithmic regret using proofs that, despite superficial differences, follow essentially the same structure. This note isolates the minimal ingredients behind these analyses: a single high-probability concentration condition on the estimators, after which logarithmic regret follows from two short deterministic lemmas describing radius collapse and optimism-forced deviations. The framework yields unified, near-minimal proofs for these classical algorithms and extends naturally to many contemporary bandit variants.

Related papers

Q-Learning with Fine-Grained Gap-Dependent Regret [13.370933509246568]
Existing model-free algorithms achieve minimax worst-case regret, but their gap-dependent bounds remain coarse and fail to fully capture the structure of suboptimality gaps.<n>We establish fine-grained gap-dependent regret bounds for both UCB-based and non-UCB-based algorithms.
arXiv Detail & Related papers (2025-10-08T05:02:16Z)
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling [24.487235945761913]
We study the problem of $K$-armed bandits with reward distributions belonging to a one- parameter exponential distribution family.<n>We propose an algorithm that can achieve multiple optimality criteria simultaneously, including Asymptotic Optimality, Minimax Optimality with a $sqrtln (K)$ factor, Sub-UCB, and variance-adaptive worst-case regret bound.
arXiv Detail & Related papers (2025-02-20T09:12:16Z)
UCB algorithms for multi-armed bandits: Precise regret and adaptive inference [6.349503549199403]
Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem.<n>We show that the Lai-Robbins regret formula is exact if and only if the sub-optimality gaps exceed the order $sigmasqrtKlog T/T$.<n>We also show that its maximal regret deviates from the minimax regret by a logarithmic factor, and therefore settling its strict minimax optimality in the negative.
arXiv Detail & Related papers (2024-12-09T01:14:02Z)
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED) is a highly effective approach to the multi-armed bandit problem. It has been observed to empirically outperform UCB-based algorithms and Thompson Sampling. We present novel linear versions of the IMED algorithm, which we call the family of LinIMED algorithms.
arXiv Detail & Related papers (2024-05-24T04:11:58Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
Non-Asymptotic Analysis of a UCB-based Top Two Algorithm [4.18804572788063]
A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms. Our proposed UCB-based Top Two simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
arXiv Detail & Related papers (2022-10-11T13:21:59Z)
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier. Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable. We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z)
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms [8.099977107670918]
Upper Confidence Bound (UCB) is an optimistic-based MAB algorithm. This paper provides new results on the arm-sampling behavior of UCB. It also provides the first process-level characterization of the MAB problem under UCB.
arXiv Detail & Related papers (2021-06-03T20:52:26Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model. We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.