Related papers: Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update

URL: http://arxiv.org/abs/2503.00419v2
Date: Thu, 12 Jun 2025 01:11:27 GMT
Title: Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update
Authors: Jing Wang, Yu-Jie Zhang, Peng Zhao, Zhi-Hua Zhou,
Abstract summary: Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits.<n>We propose a emphone-pass algorithm based on the online mirror descent framework.
Score: 62.96781471194877
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational costs: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a \emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-\epsilon}{2(1+\epsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\epsilon}{2(1+\epsilon)}}\big)$ where $d$ is the dimension and $\nu_t^{1+\epsilon}$ is the $(1+\epsilon)$-th central moment of reward at round $t$.

Related papers

Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness [74.18546828528298]
We show that SignSGD with Majority Voting can robustly work on the whole range of complexity with $kappakappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappa
arXiv Detail & Related papers (2025-02-11T19:54:11Z)
Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
We consider the problem of linear regression with self-selection bias in the unknown-index setting. We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $mathbfw_1,ldots,mathbfw_kin. Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression.
arXiv Detail & Related papers (2024-02-22T02:20:24Z)
Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations [51.14495595270775]
We adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively. Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200times$, depending on the value of the privacy parameter $epsilon$ and the sparsity of the dataset.
arXiv Detail & Related papers (2023-10-30T19:52:43Z)
Convergence analysis of online algorithms for vector-valued kernel regression [0.42970700836450487]
We consider the problem of approximating the regression function $f_mu:, Omega to Y$ from noisy $mu$-distributed vector-valued data.<n>We provide estimates for the expected squared error in the RKHS norm of the approximations $f(m)in H$ obtained by a standard regularized online approximation algorithm.
arXiv Detail & Related papers (2023-09-14T15:10:47Z)
Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds [26.277745106128197]
In this work, we address the challenge of such rewards in reinforcement learning with linear function approximation. We first design an algorithm, textscHeavy-OFUL, for heavy-tailed linear bandits, achieving an emphinstance-dependent $T$-round regret of $tildeObig. Our result is achieved via a novel self-normalized concentration inequality that may be of independent interest in handling heavytailed noise in general online regression problems.
arXiv Detail & Related papers (2023-06-12T02:56:09Z)
A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z)
Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$. In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret. We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z)
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs [111.75736569611159]
We propose the first computationally efficient horizon-free algorithm for linear mixture MDPs. Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic. This also improves upon the best-known algorithms in this setting when $sigma_k2$'s are known.
arXiv Detail & Related papers (2022-05-23T17:59:18Z)
Differentially Private $\ell_1$-norm Linear Regression with Heavy-tailed Data [22.233705161499273]
We focus on the $ell_1$-norm linear regression in the $epsilon$-DP model. We show that it is possible to achieve an upper bound of $tildeO(sqrtfracdnepsilon)$ with high probability. Our algorithms can also be extended to more relaxed cases where only each coordinate of the data has bounded moments.
arXiv Detail & Related papers (2022-01-10T08:17:05Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector. We show that this dependence on $d$ is optimal, up to logarithmic factors. We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions admit a "pseudo-1d" structure. We show a lower bound of $min(sqrtdT, T3/4)$ for the regret of any algorithm, where $T$ is the number of rounds. We propose a new algorithm sbcalg that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively.
arXiv Detail & Related papers (2021-02-15T08:16:51Z)
Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs [35.988644745703645]
We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments. We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
arXiv Detail & Related papers (2020-04-28T13:01:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.