Minimax Policy for Heavy-tailed Bandits
- URL: http://arxiv.org/abs/2007.10493v2
- Date: Wed, 18 Nov 2020 01:34:32 GMT
- Title: Minimax Policy for Heavy-tailed Bandits
- Authors: Lai Wei and Vaibhav Srivastava
- Abstract summary: We modify the minimax policy MOSS for the sub-Gaussian reward distribution by using saturated empirical mean to design a new algorithm called Robust MOSS.
We show that if the moment of order $1+epsilon$ for the reward distribution exists, then the refined strategy has a worst-case regret matching the lower bound while maintaining a distribution-dependent regret.
- Score: 10.203602318836444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the stochastic Multi-Armed Bandit (MAB) problem under worst-case
regret and heavy-tailed reward distribution. We modify the minimax policy MOSS
for the sub-Gaussian reward distribution by using saturated empirical mean to
design a new algorithm called Robust MOSS. We show that if the moment of order
$1+\epsilon$ for the reward distribution exists, then the refined strategy has
a worst-case regret matching the lower bound while maintaining a
distribution-dependent logarithm regret.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Forced Exploration in Bandit Problems [12.13966146283641]
The multi-armed bandit(MAB) is a classical sequential decision problem.
This paper aims to design a multi-armed bandit algorithm that can be implemented without using information about the reward distribution.
arXiv Detail & Related papers (2023-12-12T14:00:29Z) - $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner.
AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z) - Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems.
We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings.
We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - Finite-Time Regret of Thompson Sampling Algorithms for Exponential
Family Multi-Armed Bandits [88.21288104408556]
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family.
We propose a Thompson sampling, termed Expulli, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm.
arXiv Detail & Related papers (2022-06-07T18:08:21Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Regret Minimization in Heavy-Tailed Bandits [12.272975892517039]
We revisit the classic regret-minimization problem in the multi-armed bandit setting.
We propose an optimal algorithm that matches the lower bound exactly in the first-order term.
We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators.
arXiv Detail & Related papers (2021-02-07T07:46:24Z) - Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax
Regret [5.1398743023989555]
We study the nonstationary Multi-Armed Bandit (MAB) problem in which the distribution of rewards associated with each arm are assumed to be time-varying.
We characterize the performance of the proposed policies in terms of the worst-case regret, which is the supremum of the regret over the set of reward distribution sequences satisfying the variation budget.
arXiv Detail & Related papers (2021-01-22T07:34:09Z) - Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB.
We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits.
Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.