Related papers: Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

URL: http://arxiv.org/abs/2004.13465v1
Date: Tue, 28 Apr 2020 13:01:38 GMT
Title: Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Authors: Bo Xue, Guanghui Wang, Yimu Wang and Lijun Zhang
Abstract summary: We analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+epsilon$ moments. We propose two novel algorithms which enjoy a sublinear regret bound of $widetildeO(dfrac12Tfrac11+epsilon)$.
Score: 35.988644745703645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite $1+\epsilon$ moments for some $\epsilon\in(0,1]$. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{1+\epsilon}})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Meanwhile, we provide an $\Omega(d^{\frac{\epsilon}{1+\epsilon}}T^{\frac{1}{1+\epsilon}})$ lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of $d$ and $T$ when $\epsilon=1$. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.

Related papers

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards [38.53963338592248]
We improve both upper and lower bounds on the minimax regret compared to prior work.<n>We propose a new elimination-based algorithm guided by experimental design.<n>We show that for some geometries, such as $l_p$-norm balls for $p le 1 + epsilon$, we can further reduce the dependence on $d$.
arXiv Detail & Related papers (2025-06-05T09:07:26Z)
Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update [62.96781471194877]
Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. We propose a emphone-pass algorithm based on the online mirror descent framework.
arXiv Detail & Related papers (2025-03-01T09:41:45Z)
Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness [74.18546828528298]
We show that SignSGD with Majority Voting can robustly work on the whole range of complexity with $kappakappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappakappakappa-1right, kappa
arXiv Detail & Related papers (2025-02-11T19:54:11Z)
Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
We propose and analyze TRAiL, a regret-optimal forced exploration algorithm for linear bandits. TraiL ensures a $Omega(sqrtT)$ growth in the inference quality, measured via the minimum eigenvalue of the design (regressor) matrix. We characterize an $Omega(sqrtT)$ minimax lower bound for any algorithm on the expected regret.
arXiv Detail & Related papers (2024-11-19T01:08:13Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
Federated Linear Bandits with Finite Adversarial Actions [20.1041278044797]
We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm. We prove that FedSupLinUCB achieves a total regret of $tildeO(sqrtd T)$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model.
arXiv Detail & Related papers (2023-11-02T03:41:58Z)
Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z)
First- and Second-Order Bounds for Adversarial Linear Contextual Bandits [22.367921675238318]
We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction. Since $V_T$ or $L_T*$ may be significantly smaller than $T$, these improve over the worst-case regret whenever the environment is relatively benign.
arXiv Detail & Related papers (2023-05-01T14:00:15Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model. It is assumed that the graph's structure is known and has $N$ nodes. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z)
Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$. In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret. We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z)
On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function. We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z)
Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.