CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded
Stochastic Corruption
- URL: http://arxiv.org/abs/2309.16563v1
- Date: Thu, 28 Sep 2023 16:19:53 GMT
- Title: CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded
Stochastic Corruption
- Authors: Shubhada Agrawal, Timoth\'ee Mathieu, Debabrota Basu, Odalric-Ambrym
Maillard
- Abstract summary: We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions.
We introduce CRIMED, an algorithm that achieves the exact lower bound on regret for bandits with known variance.
Notably, CRIMED can effectively handle corruptions with $varepsilon$ values as high as $frac12$.
- Score: 21.709840199725015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the regret-minimisation problem in a multi-armed bandit
setting with arbitrary corruptions. Similar to the classical setup, the agent
receives rewards generated independently from the distribution of the arm
chosen at each time. However, these rewards are not directly observed. Instead,
with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from
the chosen arm's distribution with probability $1-\varepsilon$, or from an
arbitrary corruption distribution with probability $\varepsilon$. Importantly,
we impose no assumptions on these corruption distributions, which can be
unbounded. In this setting, accommodating potentially unbounded corruptions, we
establish a problem-dependent lower bound on regret for a given family of arm
distributions. We introduce CRIMED, an asymptotically-optimal algorithm that
achieves the exact lower bound on regret for bandits with Gaussian
distributions with known variance. Additionally, we provide a finite-sample
analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle
corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we
develop a tight concentration result for medians in the presence of arbitrary
corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be
of independent interest. We also discuss an extension of the algorithm for
handling misspecification in Gaussian model.
Related papers
- Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Fixed-Budget Differentially Private Best Arm Identification [62.36929749450298]
We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints.
We derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$.
arXiv Detail & Related papers (2024-01-17T09:23:25Z) - $(\epsilon, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits [29.966828248335972]
We study the regret minimization problem when $epsilon$ and $u$ are unknown to the learner.
AdaR-UCB is the first algorithm that enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case.
arXiv Detail & Related papers (2023-10-04T17:11:15Z) - Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs [7.125769932993104]
We consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion.
At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward.
We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm.
arXiv Detail & Related papers (2022-06-24T18:48:35Z) - Bandits Corrupted by Nature: Lower Bounds on Regret and Robust
Optimistic Algorithm [14.214707836697823]
We study the corrupted bandit problem, i.e. a multi-armed bandit problem with $k$ unknown reward distributions.
We propose a novel UCB-type algorithm for corrupted bandits, namely HubUCB, that builds on Huber's estimator for robust mean estimation.
We experimentally illustrate the efficiency of HubUCB and SeqHubUCB in solving corrupted bandits for different reward distributions and different levels of corruptions.
arXiv Detail & Related papers (2022-03-07T07:44:05Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Robust Learning of Optimal Auctions [84.13356290199603]
We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed.
We propose new algorithms that can learn a mechanism whose revenue is nearly optimal simultaneously for all true distributions'' that are $alpha$-close to the original distribution in Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2021-07-13T17:37:21Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.