Robust Lipschitz Bandits to Adversarial Corruptions
- URL: http://arxiv.org/abs/2305.18543v2
- Date: Sun, 8 Oct 2023 19:45:06 GMT
- Title: Robust Lipschitz Bandits to Adversarial Corruptions
- Authors: Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee
- Abstract summary: Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
- Score: 61.85150061213987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lipschitz bandit is a variant of stochastic bandits that deals with a
continuous arm set defined on a metric space, where the reward function is
subject to a Lipschitz constraint. In this paper, we introduce a new problem of
Lipschitz bandits in the presence of adversarial corruptions where an adaptive
adversary corrupts the stochastic rewards up to a total budget $C$. The budget
is measured by the sum of corruption levels across the time horizon $T$. We
consider both weak and strong adversaries, where the weak adversary is unaware
of the current action before the attack, while the strong one can observe it.
Our work presents the first line of robust Lipschitz bandit algorithms that can
achieve sub-linear regret under both types of adversary, even when the total
budget of corruption $C$ is unrevealed to the agent. We provide a lower bound
under each type of adversary, and show that our algorithm is optimal under the
strong case. Finally, we conduct experiments to illustrate the effectiveness of
our algorithms against two classic kinds of attacks.
Related papers
- Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - When Are Linear Stochastic Bandits Attackable? [47.25702824488642]
This paper studies the attackability of a $k$-armed linear bandit environment.
We propose a two-stage attack method against LinUCB and Robust Phase Elimination.
arXiv Detail & Related papers (2021-10-18T04:12:09Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.