Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification
- URL: http://arxiv.org/abs/2410.07533v3
- Date: Thu, 17 Oct 2024 18:29:04 GMT
- Title: Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification
- Authors: Haolin Liu, Artin Tajdini, Andrew Wagenmaker, Chen-Yu Wei,
- Abstract summary: In linear bandits, how can a learner effectively learn when facing corrupted rewards?
We compare two types of corruptions commonly considered: strong corruption, where the corruption level depends on the action chosen by the learner, and weak corruption, where the corruption level does not depend on the action chosen by the learner.
For linear bandits, we fully characterize the gap between the minimax regret under strong and weak corruptions.
- Score: 17.288347876319126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In linear bandits, how can a learner effectively learn when facing corrupted rewards? While significant work has explored this question, a holistic understanding across different adversarial models and corruption measures is lacking, as is a full characterization of the minimax regret bounds. In this work, we compare two types of corruptions commonly considered: strong corruption, where the corruption level depends on the action chosen by the learner, and weak corruption, where the corruption level does not depend on the action chosen by the learner. We provide a unified framework to analyze these corruptions. For stochastic linear bandits, we fully characterize the gap between the minimax regret under strong and weak corruptions. We also initiate the study of corrupted adversarial linear bandits, obtaining upper and lower bounds with matching dependencies on the corruption level. Next, we reveal a connection between corruption-robust learning and learning with gap-dependent mis-specification, a setting first studied by Liu et al. (2023a), where the misspecification level of an action or policy is proportional to its suboptimality. We present a general reduction that enables any corruption-robust algorithm to handle gap-dependent misspecification. This allows us to recover the results of Liu et al. (2023a) in a black-box manner and significantly generalize them to settings like linear MDPs, yielding the first results for gap-dependent misspecification in reinforcement learning. However, this general reduction does not attain the optimal rate for gap-dependent misspecification. Motivated by this, we develop a specialized algorithm that achieves optimal bounds for gap-dependent misspecification in linear bandits, thus answering an open question posed by Liu et al. (2023a).
Related papers
- Enhancing Infrared Small Target Detection Robustness with Bi-Level
Adversarial Framework [61.34862133870934]
We propose a bi-level adversarial framework to promote the robustness of detection in the presence of distinct corruptions.
Our scheme remarkably improves 21.96% IOU across a wide array of corruptions and notably promotes 4.97% IOU on the general benchmark.
arXiv Detail & Related papers (2023-09-03T06:35:07Z) - Corruptions of Supervised Learning Problems: Typology and Mitigations [11.294508617469905]
We develop a general theory of corruption from an information-theoretic perspective.
We will focus here on changes in probability distributions.
This work sheds light on complexities arising from joint and dependent corruptions on both labels and attributes.
arXiv Detail & Related papers (2023-07-17T16:57:01Z) - Robust Lipschitz Bandits to Adversarial Corruptions [61.85150061213987]
Lipschitz bandit is a variant of bandits that deals with a continuous arm set defined on a metric space.
In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions.
Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary.
arXiv Detail & Related papers (2023-05-29T18:16:59Z) - Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
Contextual Bandits and Markov Decision Processes [59.61248760134937]
We propose an efficient algorithm to achieve a regret of $tildeO(sqrtT+zeta)$.
The proposed algorithm relies on the recently developed uncertainty-weighted least-squares regression from linear contextual bandit.
We generalize our algorithm to the episodic MDP setting and first achieve an additive dependence on the corruption level $zeta$.
arXiv Detail & Related papers (2022-12-12T15:04:56Z) - Soft Diffusion: Score Matching for General Corruptions [84.26037497404195]
We propose a new objective called Soft Score Matching that provably learns the score function for any linear corruption process.
We show that our objective learns the gradient of the likelihood under suitable regularity conditions for the family of corruption processes.
Our method achieves state-of-the-art FID score $1.85$ on CelebA-64, outperforming all previous linear diffusion models.
arXiv Detail & Related papers (2022-09-12T17:45:03Z) - Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z) - On Optimal Robustness to Adversarial Corruption in Online Decision
Problems [27.68461396741871]
We show that optimal robustness can be expressed by a square-root dependency on the amount of corruption.
For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
arXiv Detail & Related papers (2021-09-22T18:26:45Z) - Using the Overlapping Score to Improve Corruption Benchmarks [6.445605125467574]
We propose a metric called corruption overlapping score, which can be used to reveal flaws in corruption benchmarks.
We argue that taking into account overlappings between corruptions can help to improve existing benchmarks or build better ones.
arXiv Detail & Related papers (2021-05-26T06:42:54Z) - Improved Corruption Robust Algorithms for Episodic Reinforcement
Learning [43.279169081740726]
We study episodic reinforcement learning under unknown adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
We propose new algorithms which, compared to the existing results, achieve strictly better regret bounds in terms of total corruptions.
Our results follow from a general algorithmic framework that combines corruption-robust policy elimination meta-algorithms, and plug-in reward-free exploration sub-algorithms.
arXiv Detail & Related papers (2021-02-13T07:04:23Z) - Corruption-robust exploration in episodic reinforcement learning [76.19192549843727]
We study multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
Our framework yields efficient algorithms which attain near-optimal regret in the absence of corruptions.
Notably, our work provides the first sublinear regret guarantee which any deviation from purely i.i.d. transitions in the bandit-feedback model for episodic reinforcement learning.
arXiv Detail & Related papers (2019-11-20T03:49:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.