A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits
- URL: http://arxiv.org/abs/2202.01850v1
- Date: Thu, 3 Feb 2022 21:19:36 GMT
- Title: A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits
- Authors: Ilija Bogunovic, Zihan Li, Andreas Krause, Jonathan Scarlett
- Abstract summary: We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
- Score: 118.22458816174144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the sequential optimization of an unknown, continuous, and
expensive to evaluate reward function, from noisy and adversarially corrupted
observed rewards. When the corruption attacks are subject to a suitable budget
$C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the
problem can be posed as corrupted Gaussian process (GP) bandit optimization. We
propose a novel robust elimination-type algorithm that runs in epochs, combines
exploration with infrequent switching to select a small subset of actions, and
plays each action for multiple time instants. Our algorithm, Robust GP Phased
Elimination (RGP-PE), successfully balances robustness to corruptions with
exploration and exploitation such that its performance degrades minimally in
the presence (or absence) of adversarial corruptions. When $T$ is the number of
samples and $\gamma_T$ is the maximal information gain, the
corruption-dependent term in our regret bound is $O(C \gamma_T^{3/2})$, which
is significantly tighter than the existing $O(C \sqrt{T \gamma_T})$ for several
commonly-considered kernels. We perform the first empirical study of robustness
in the corrupted GP bandit setting, and show that our algorithm is robust
against a variety of adversarial attacks.
Related papers
- Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - On Optimal Robustness to Adversarial Corruption in Online Decision
Problems [27.68461396741871]
We show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
arXiv Detail & Related papers (2021-09-22T18:26:45Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm.
We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow"
We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.