Related papers: $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

$(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

URL: http://arxiv.org/abs/2310.02975v2
Date: Mon, 12 Feb 2024 10:39:44 GMT
Title: $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits
Authors: Gianmarco Genalti and Lupo Marsigli and Nicola Gatti and Alberto Maria Metelli
Abstract summary: We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner. AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
Score: 29.966828248335972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order $1+\epsilon$, uniformly bounded by a constant $u<+\infty$, for some $\epsilon \in (0,1]$. In this setting, we study the regret minimization problem when $\epsilon$ and $u$ are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, AdaR-UCB, that leverages such an estimator. Finally, we show that AdaR-UCB is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.

Related papers

Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits [31.212504858546232]
We tackle the heavy-tailed piecewise-stationary bandit problem.<n>We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions.<n>We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits.
arXiv Detail & Related papers (2025-05-26T14:40:47Z)
p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds. Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z)
Rate-Preserving Reductions for Blackwell Approachability [72.03309261614991]
Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent. We show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization.
arXiv Detail & Related papers (2024-06-10T23:23:52Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Degeneracy is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions [14.959412298776199]
We study the classical Network Revenue Management (NRM) problem with accept/reject decisions and $T$ IID arrivals. We develop an online algorithm that achieves $O(log2 T)$ regret under this model. We derive a second result that achieves $O(log T)$ regret under an additional assumption of second-order growth.
arXiv Detail & Related papers (2022-10-14T17:52:19Z)
First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces. We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning. The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z)
Regret Minimization in Heavy-Tailed Bandits [12.272975892517039]
We revisit the classic regret-minimization problem in the multi-armed bandit setting. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators.
arXiv Detail & Related papers (2021-02-07T07:46:24Z)
Minimax Policy for Heavy-tailed Bandits [10.203602318836444]
We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS. We show that if the moment of order $1+epsilon$ for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent regret.
arXiv Detail & Related papers (2020-07-20T21:43:57Z)
Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification. Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.