Linear Contextual Bandits with Adversarial Corruptions
- URL: http://arxiv.org/abs/2110.12615v1
- Date: Mon, 25 Oct 2021 02:53:24 GMT
- Title: Linear Contextual Bandits with Adversarial Corruptions
- Authors: Heyang Zhao and Dongruo Zhou and Quanquan Gu
- Abstract summary: We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
- Score: 91.38793800392108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the linear contextual bandit problem in the presence of adversarial
corruption, where the interaction between the player and a possibly infinite
decision set is contaminated by an adversary that can corrupt the reward up to
a corruption level $C$ measured by the sum of the largest alteration on rewards
in each round. We present a variance-aware algorithm that is adaptive to the
level of adversarial contamination $C$. The key algorithmic design includes (1)
a multi-level partition scheme of the observed data, (2) a cascade of
confidence sets that are adaptive to the level of the corruption, and (3) a
variance-aware confidence set construction that can take advantage of
low-variance reward. We further prove that the regret of the proposed algorithm
is $\tilde{O}(C^2d\sqrt{\sum_{t = 1}^T \sigma_t^2} + C^2R\sqrt{dT})$, where $d$
is the dimension of context vectors, $T$ is the number of rounds, $R$ is the
range of noise and $\sigma_t^2,t=1\ldots,T$ are the variances of instantaneous
reward. We also prove a gap-dependent regret bound for the proposed algorithm,
which is instance-dependent and thus leads to better performance on good
practical instances. To the best of our knowledge, this is the first
variance-aware corruption-robust algorithm for contextual bandits. Experiments
on synthetic data corroborate our theory.
Related papers
- Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Minimax Rates for Robust Community Detection [19.229475414802213]
We study the problem of community detection in the block model with adversarial node corruptions.
Our main result is an efficient algorithm that can tolerate an $epsilon$-fraction of corruptions and unbounded error $O(epsilon) + e-fracC2 (1 pm o(1))$ where $C = (sqrta - sqrtb)2$ is the signal-to-noise ratio.
We show that our algorithms are doubly-robust in the sense that they work in an even more
arXiv Detail & Related papers (2022-07-25T04:45:16Z) - Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.