Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits
with Super Heavy-Tailed Payoffs
- URL: http://arxiv.org/abs/2110.13876v1
- Date: Tue, 26 Oct 2021 17:30:44 GMT
- Title: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits
with Super Heavy-Tailed Payoffs
- Authors: Han Zhong, Jiayi Huang, Lin F. Yang, Liwei Wang
- Abstract summary: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians.
We show that the regret bound is near-optimal even with very heavy-tailed noise.
- Score: 27.636407641546914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite a large amount of effort in dealing with heavy-tailed error in
machine learning, little is known when moments of the error can become
non-existential: the random noise $\eta$ satisfies Pr$\left[|\eta| > |y|\right]
\le 1/|y|^{\alpha}$ for some $\alpha > 0$. We make the first attempt to
actively handle such super heavy-tailed noise in bandit learning problems: We
propose a novel robust statistical estimator, mean of medians, which estimates
a random variable by computing the empirical mean of a sequence of empirical
medians. We then present a generic reductionist algorithmic framework for
solving bandit learning problems (including multi-armed and linear bandit
problem): the mean of medians estimator can be applied to nearly any bandit
learning algorithm as a black-box filtering for its reward signals and obtain
similar regret bound as if the reward is sub-Gaussian. We show that the regret
bound is near-optimal even with very heavy-tailed noise. We also empirically
demonstrate the effectiveness of the proposed algorithm, which further
corroborates our theoretical results.
Related papers
- Learning Constant-Depth Circuits in Malicious Noise Models [9.036777309376697]
We prove Linial, Mansour, and Nisan's quasipolynomial-time algorithm for learning constant-depth circuits.
We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model.
arXiv Detail & Related papers (2024-11-06T00:19:58Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Replicability is Asymptotically Free in Multi-armed Bandits [45.729105054410745]
This work is motivated by the growing demand for reproducible machine learning.
In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
arXiv Detail & Related papers (2024-02-12T03:31:34Z) - Zero-Inflated Bandits [11.60342504007264]
We study zero-inflated bandits, where the reward is modeled as a classic semi-parametric distribution called zero-inflated distribution.
We derive the regret bounds under both multi-armed bandits with general reward assumptions and contextual generalized linear bandit with sub-Gaussian rewards.
In many settings, the regret rates of our algorithms are either minimax optimal or state-of-the-art.
arXiv Detail & Related papers (2023-12-25T03:13:21Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - Bandit Algorithms for Prophet Inequality and Pandora's Box [13.709418181148148]
We study the Prophet Inequality and Pandora's Box problems in the Multi-Armed Bandits model.
Our results give near-optimal $tildeO(mathsfpoly(n)sqrtT)$ total regret algorithms for both Prophet Inequality and Pandora's Box.
arXiv Detail & Related papers (2022-11-16T00:10:35Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - No Regrets for Learning the Prior in Bandits [30.478188057004175]
$tt AdaTS$ is a Thompson sampling algorithm that adapts sequentially to bandit tasks.
$tt AdaTS$ is a fully-Bayesian algorithm that can be implemented efficiently in several classes of bandit problems.
arXiv Detail & Related papers (2021-07-13T15:51:32Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.