$(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits
- URL: http://arxiv.org/abs/2310.02975v2
- Date: Mon, 12 Feb 2024 10:39:44 GMT
- Title: $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits
- Authors: Gianmarco Genalti and Lupo Marsigli and Nicola Gatti and Alberto Maria
Metelli
- Abstract summary: We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner.
AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
- Score: 29.966828248335972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Heavy-tailed distributions naturally arise in several settings, from finance
to telecommunications. While regret minimization under subgaussian or bounded
rewards has been widely studied, learning with heavy-tailed distributions only
gained popularity over the last decade. In this paper, we consider the setting
in which the reward distributions have finite absolute raw moments of maximum
order $1+\epsilon$, uniformly bounded by a constant $u<+\infty$, for some
$\epsilon \in (0,1]$. In this setting, we study the regret minimization problem
when $\epsilon$ and $u$ are unknown to the learner and it has to adapt. First,
we show that adaptation comes at a cost and derive two negative results proving
that the same regret guarantees of the non-adaptive case cannot be achieved
with no further assumptions. Then, we devise and analyze a fully data-driven
trimmed mean estimator and propose a novel adaptive regret minimization
algorithm, AdaR-UCB, that leverages such an estimator. Finally, we show that
AdaR-UCB is the first algorithm that, under a known distributional assumption,
enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed
case.
Related papers
- Rate-Preserving Reductions for Blackwell Approachability [72.03309261614991]
Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent.
We show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization.
arXiv Detail & Related papers (2024-06-10T23:23:52Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Degeneracy is OK: Logarithmic Regret for Network Revenue Management with
Indiscrete Distributions [14.959412298776199]
We study the classical Network Revenue Management (NRM) problem with accept/reject decisions and $T$ IID arrivals.
We develop an online algorithm that achieves $O(log2 T)$ regret under this model.
We derive a second result that achieves $O(log T)$ regret under an additional assumption of second-order growth.
arXiv Detail & Related papers (2022-10-14T17:52:19Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Regret Minimization in Heavy-Tailed Bandits [12.272975892517039]
We revisit the classic regret-minimization problem in the multi-armed bandit setting.
We propose an optimal algorithm that matches the lower bound exactly in the first-order term.
We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators.
arXiv Detail & Related papers (2021-02-07T07:46:24Z) - Minimax Policy for Heavy-tailed Bandits [10.203602318836444]
We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS.
We show that if the moment of order $1+epsilon$ for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent regret.
arXiv Detail & Related papers (2020-07-20T21:43:57Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.