Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions
- URL: http://arxiv.org/abs/2205.06811v1
- Date: Fri, 13 May 2022 17:58:58 GMT
- Title: Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions
- Authors: Jiafan He and Dongruo Zhou and Tong Zhang and Quanquan Gu
- Abstract summary: We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
- Score: 98.75618795470524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the linear contextual bandit problem in the presence of adversarial
corruption, where the reward at each round is corrupted by an adversary, and
the corruption level (i.e., the sum of corruption magnitudes over the horizon)
is $C\geq 0$. The best-known algorithms in this setting are limited in that
they either are computationally inefficient or require a strong assumption on
the corruption, or their regret is at least $C$ times worse than the regret
without corruption. In this paper, to overcome these limitations, we propose a
new algorithm based on the principle of optimism in the face of uncertainty. At
the core of our algorithm is a weighted ridge regression where the weight of
each chosen action depends on its confidence up to some threshold. We show that
for both known $C$ and unknown $C$ cases, our algorithm with proper choice of
hyperparameter achieves a regret that nearly matches the lower bounds. Thus,
our algorithm is nearly optimal up to logarithmic factors for both cases.
Notably, our algorithm achieves the near-optimal regret for both corrupted and
uncorrupted cases ($C=0$) simultaneously.
Related papers
- Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - On Optimal Robustness to Adversarial Corruption in Online Decision
Problems [27.68461396741871]
We show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
arXiv Detail & Related papers (2021-09-22T18:26:45Z) - Improved Corruption Robust Algorithms for Episodic Reinforcement
Learning [43.279169081740726]
We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
We propose new algorithms which, compared to the existing results, achieve strictly better regret bounds in terms of total corruptions.
Our results follow from a general algorithmic framework that combines corruption-robust policy elimination meta-algorithms, and plug-in reward-free exploration sub-algorithms.
arXiv Detail & Related papers (2021-02-13T07:04:23Z) - Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward.
We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z) - Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm.
We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow"
We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.