Bandits Corrupted by Nature: Lower Bounds on Regret and Robust
Optimistic Algorithm
- URL: http://arxiv.org/abs/2203.03186v2
- Date: Tue, 21 Mar 2023 10:04:25 GMT
- Title: Bandits Corrupted by Nature: Lower Bounds on Regret and Robust
Optimistic Algorithm
- Authors: Debabrota Basu, Odalric-Ambrym Maillard, Timoth\'ee Mathieu
- Abstract summary: We study the corrupted bandit problem, i.e. a multi-armed bandit problem with $k$ unknown reward distributions.
We propose a novel UCB-type algorithm for corrupted bandits, namely HubUCB, that builds on Huber's estimator for robust mean estimation.
We experimentally illustrate the efficiency of HubUCB and SeqHubUCB in solving corrupted bandits for different reward distributions and different levels of corruptions.
- Score: 14.214707836697823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the corrupted bandit problem, i.e. a stochastic multi-armed bandit
problem with $k$ unknown reward distributions, which are heavy-tailed and
corrupted by a history-independent adversary or Nature. To be specific, the
reward obtained by playing an arm comes from corresponding heavy-tailed reward
distribution with probability $1-\varepsilon \in (0.5,1]$ and an arbitrary
corruption distribution of unbounded support with probability $\varepsilon \in
[0,0.5)$.
First, we provide $\textit{a problem-dependent lower bound on the regret}$ of
any corrupted bandit algorithm. The lower bounds indicate that the corrupted
bandit problem is harder than the classical stochastic bandit problem with
sub-Gaussian or heavy-tail rewards.
Following that, we propose a novel UCB-type algorithm for corrupted bandits,
namely HubUCB, that builds on Huber's estimator for robust mean estimation.
Leveraging a novel concentration inequality of Huber's estimator, we prove that
HubUCB achieves a near-optimal regret upper bound.
Since computing Huber's estimator has quadratic complexity, we further
introduce a sequential version of Huber's estimator that exhibits linear
complexity. We leverage this sequential estimator to design SeqHubUCB that
enjoys similar regret guarantees while reducing the computational burden.
Finally, we experimentally illustrate the efficiency of HubUCB and SeqHubUCB
in solving corrupted bandits for different reward distributions and different
levels of corruptions.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded
Stochastic Corruption [21.709840199725015]
We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions.
We introduce CRIMED, an algorithm that achieves the exact lower bound on regret for bandits with known variance.
Notably, CRIMED can effectively handle corruptions with $varepsilon$ values as high as $frac12$.
arXiv Detail & Related papers (2023-09-28T16:19:53Z) - Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits
with Super Heavy-Tailed Payoffs [27.636407641546914]
We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians.
We show that the regret bound is near-optimal even with very heavy-tailed noise.
arXiv Detail & Related papers (2021-10-26T17:30:44Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.