Related papers: Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

URL: http://arxiv.org/abs/2506.04775v1
Date: Thu, 05 Jun 2025 09:07:26 GMT
Title: Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards
Authors: Artin Tajdini, Jonathan Scarlett, Kevin Jamieson,
Abstract summary: We improve both upper and lower bounds on the minimax regret compared to prior work.<n>We propose a new elimination-based algorithm guided by experimental design.<n>We show that for some geometries, such as $l_p$-norm balls for $p le 1 + epsilon$, we can further reduce the dependence on $d$.
Score: 38.53963338592248
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+\epsilon)$-absolute central moment bounded by $\upsilon$ for some $\epsilon \in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+\epsilon}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $\Omega(d^{\frac{\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($\epsilon = 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3\epsilon}{2(1+\epsilon)}} T^{\frac{1}{1+\epsilon}})$, thus improving the dependence on $d$ for all $\epsilon \in (0,1)$ and recovering a known optimal result for $\epsilon = 1$. We also establish a lower bound of $\Omega(d^{\frac{2\epsilon}{1+\epsilon}} T^{\frac{1}{1+\epsilon}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + \epsilon$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Mat\'ern kernel that are the first to be sublinear for all $\epsilon \in (0, 1]$.

Related papers

Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization. Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires. We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z)
Low-rank Matrix Bandits with Heavy-tailed Rewards [55.03293214439741]
We study the problem of underlinelow-rank matrix bandit with underlineheavy-underlinetailed underlinerewards (LowHTR) By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS.
arXiv Detail & Related papers (2024-04-26T21:54:31Z)
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds [26.277745106128197]
In this work, we address the challenge of such rewards in reinforcement learning with linear function approximation. We first design an algorithm, textscHeavy-OFUL, for heavy-tailed linear bandits, achieving an emphinstance-dependent $T$-round regret of $tildeObig. Our result is achieved via a novel self-normalized concentration inequality that may be of independent interest in handling heavytailed noise in general online regression problems.
arXiv Detail & Related papers (2023-06-12T02:56:09Z)
Does Sparsity Help in Learning Misspecified Linear Bandits? [32.920577630673804]
We show that algorithms can obtain $O(varepsilon)$-optimal actions by querying $O(varepsilon-sds)$ actions. We also show that our upper bound on sample complexity is nearly tight if one demands an error $ O(sdeltavarepsilon)$ for $0delta1$.
arXiv Detail & Related papers (2023-03-29T19:58:39Z)
Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime [74.52487417350221]
We consider online learning problems in the realizable setting, where there is a zero-loss solution. We propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds.
arXiv Detail & Related papers (2023-02-27T21:19:24Z)
Sparse Dimensionality Reduction Revisited [13.170012290527017]
The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. We revisit sparse embeddings and identify a loophole in the lower bound. We also improve the sparsity of the best oblivious subspace embeddings for optimal embedding dimensionality.
arXiv Detail & Related papers (2023-02-13T08:01:25Z)
Differentially Private Stochastic Linear Bandits: (Almost) for Free [17.711701190680742]
In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In the shuffled model, we also achieve regret of $tildeO(sqrtT+frac1epsilon)$ %for small $epsilon$ as in the central case, while the best previously known algorithm suffers a regret of $tildeO(frac1epsilonT3/5)$.
arXiv Detail & Related papers (2022-07-07T17:20:57Z)
Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z)
Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets. Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs [35.988644745703645]
We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments. We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
arXiv Detail & Related papers (2020-04-28T13:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.