On Optimal Robustness to Adversarial Corruption in Online Decision
Problems
- URL: http://arxiv.org/abs/2109.10963v1
- Date: Wed, 22 Sep 2021 18:26:45 GMT
- Title: On Optimal Robustness to Adversarial Corruption in Online Decision
Problems
- Authors: Shinji Ito
- Abstract summary: We show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
- Score: 27.68461396741871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers two fundamental sequential decision-making problems: the
problem of prediction with expert advice and the multi-armed bandit problem. We
focus on stochastic regimes in which an adversary may corrupt losses, and we
investigate what level of robustness can be achieved against adversarial
corruptions. The main contribution of this paper is to show that optimal
robustness can be expressed by a square-root dependency on the amount of
corruption. More precisely, we show that two classes of algorithms, anytime
Hedge with decreasing learning rate and algorithms with second-order regret
bounds, achieve $O( \frac{\log N}{\Delta} + \sqrt{ \frac{C \log N }{\Delta} }
)$-regret, where $N, \Delta$, and $C$ represent the number of experts, the gap
parameter, and the corruption level, respectively. We further provide a
matching lower bound, which means that this regret bound is tight up to a
constant factor. For the multi-armed bandit problem, we also provide a nearly
tight lower bound up to a logarithmic factor.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z) - Versatile Dueling Bandits: Best-of-both-World Analyses for Online
Learning from Preferences [28.79598714109439]
We study the problem of $K$-armed dueling bandit for both and adversarial environments.
We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits.
Our algorithm is also the first to achieve an optimal $O(sum_i = 1K fraclog TDelta_i)$ regret bound against the Condorcet-winner benchmark.
arXiv Detail & Related papers (2022-02-14T13:37:23Z) - A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian
Process Bandits [118.22458816174144]
We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants.
Our algorithm, GP Robust Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation.
We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T21:19:36Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.