Related papers: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

URL: http://arxiv.org/abs/2110.13876v1
Date: Tue, 26 Oct 2021 17:30:44 GMT
Title: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
Authors: Han Zhong, Jiayi Huang, Lin F. Yang, Liwei Wang
Abstract summary: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We show that the regret bound is near-optimal even with very heavy-tailed noise.
Score: 27.636407641546914
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $\eta$ satisfies Pr$\left[|\eta| > |y|\right] \le 1/|y|^{\alpha}$ for some $\alpha > 0$. We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that the regret bound is near-optimal even with very heavy-tailed noise. We also empirically demonstrate the effectiveness of the proposed algorithm, which further corroborates our theoretical results.

Related papers

Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Learning Constant-Depth Circuits in Malicious Noise Models [9.036777309376697]
We prove Linial, Mansour, and Nisan's quasipolynomial-time algorithm for learning constant-depth circuits. We attain the best possible dependence on the noise rate and succeed in the harshest possible noise model.
arXiv Detail & Related papers (2024-11-06T00:19:58Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery [45.601316850669406]
We present efficient algorithms for policy evaluation, best policy identification and regret minimization. For policy evaluation and best policy identification, we show that our algorithms are nearly minimax optimal. All the proposed algorithms consist of two phases: they first leverage spectral methods to estimate the left and right singular subspaces of the low-rank reward matrix.
arXiv Detail & Related papers (2024-02-24T06:36:08Z)
Replicability is Asymptotically Free in Multi-armed Bandits [45.729105054410745]
This work is motivated by the growing demand for reproducible machine learning. In particular, we consider a replicable algorithm that ensures, with high probability, that the algorithm's sequence of actions is not affected by the randomness inherent in the dataset.
arXiv Detail & Related papers (2024-02-12T03:31:34Z)
Zero-Inflated Bandits [11.60342504007264]
We study zero-inflated bandits, where the reward is modeled as a classic semi-parametric distribution called zero-inflated distribution. We derive the regret bounds under both multi-armed bandits with general reward assumptions and contextual generalized linear bandit with sub-Gaussian rewards. In many settings, the regret rates of our algorithms are either minimax optimal or state-of-the-art.
arXiv Detail & Related papers (2023-12-25T03:13:21Z)
Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback. We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z)
Bandit Algorithms for Prophet Inequality and Pandora's Box [13.709418181148148]
We study the Prophet Inequality and Pandora's Box problems in the Multi-Armed Bandits model. Our results give near-optimal $tildeO(mathsfpoly(n)sqrtT)$ total regret algorithms for both Prophet Inequality and Pandora's Box.
arXiv Detail & Related papers (2022-11-16T00:10:35Z)
Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm [14.214707836697823]
We study the corrupted bandit problem, i.e. a multi-armed bandit problem with $k$ unknown reward distributions. We propose a novel UCB-type algorithm for corrupted bandits, namely HubUCB, that builds on Huber's estimator for robust mean estimation. We experimentally illustrate the efficiency of HubUCB and SeqHubUCB in solving corrupted bandits for different reward distributions and different levels of corruptions.
arXiv Detail & Related papers (2022-03-07T07:44:05Z)
Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption. We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z)
No Regrets for Learning the Prior in Bandits [30.478188057004175]
$tt AdaTS$ is a Thompson sampling algorithm that adapts sequentially to bandit tasks. $tt AdaTS$ is a fully-Bayesian algorithm that can be implemented efficiently in several classes of bandit problems.
arXiv Detail & Related papers (2021-07-13T15:51:32Z)
A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints [7.716156977428555]
We present a new algorithm called Fair Upper Confidence Bound with Exploration Fair-UCBe for solving a slowly varying $k$-armed bandit problem. We show that the performance of our algorithm in the non-stationary case approaches that of its stationary counterpart tends to zero.
arXiv Detail & Related papers (2020-12-24T18:12:01Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Learning Multiclass Classifier Under Noisy Bandit Feedback [6.624726878647541]
We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We show our approach's effectiveness using extensive experiments on several benchmark datasets.
arXiv Detail & Related papers (2020-06-05T16:31:05Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.