Prediction with Corrupted Expert Advice
- URL: http://arxiv.org/abs/2002.10286v2
- Date: Tue, 20 Oct 2020 20:17:11 GMT
- Title: Prediction with Corrupted Expert Advice
- Authors: Idan Amir, Idan Attias, Tomer Koren, Roi Livni, Yishay Mansour
- Abstract summary: We prove that a variant of the classical Multiplicative Weights algorithm with decreasing step sizes achieves constant regret in a benign environment.
Our results reveal a surprising disparity between the often comparable Follow the Regularized Leader (FTRL) and Online Mirror Descent (OMD) frameworks.
- Score: 67.67399390910381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the fundamental problem of prediction with expert advice, in a
setting where the environment is benign and generates losses stochastically,
but the feedback observed by the learner is subject to a moderate adversarial
corruption. We prove that a variant of the classical Multiplicative Weights
algorithm with decreasing step sizes achieves constant regret in this setting
and performs optimally in a wide range of environments, regardless of the
magnitude of the injected corruption. Our results reveal a surprising disparity
between the often comparable Follow the Regularized Leader (FTRL) and Online
Mirror Descent (OMD) frameworks: we show that for experts in the corrupted
stochastic regime, the regret performance of OMD is in fact strictly inferior
to that of FTRL.
Related papers
- Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning [0.19418036471925312]
We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning.
Our framework enhances the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis.
arXiv Detail & Related papers (2024-08-05T08:12:25Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Robust Generalization against Photon-Limited Corruptions via Worst-Case
Sharpness Minimization [89.92932924515324]
Robust generalization aims to tackle the most challenging data distributions which are rare in the training set and contain severe noises.
Common solutions such as distributionally robust optimization (DRO) focus on the worst-case empirical risk to ensure low training error.
We propose SharpDRO by penalizing the sharpness of the worst-case distribution, which measures the loss changes around the neighbor of learning parameters.
We show that SharpDRO exhibits a strong generalization ability against severe corruptions and exceeds well-known baseline methods with large performance gains.
arXiv Detail & Related papers (2023-03-23T07:58:48Z) - Your Policy Regularizer is Secretly an Adversary [13.625408555732752]
We show how robustness arises from hedging against worst-case perturbations of the reward function.
We characterize this robust set of adversarial reward perturbations under KL and alpha-divergence regularization.
We provide detailed discussion of the worst-case reward perturbations, and present intuitive empirical examples to illustrate this robustness.
arXiv Detail & Related papers (2022-03-23T17:54:20Z) - Contrastive Learning for Debiased Candidate Generation in Large-Scale
Recommender Systems [84.3996727203154]
We show that a popular choice of contrastive loss is equivalent to reducing the exposure bias via inverse propensity weighting.
We further improve upon CLRec and propose Multi-CLRec, for accurate multi-intention aware bias reduction.
Our methods have been successfully deployed in Taobao, where at least four-month online A/B tests and offline analyses demonstrate its substantial improvements.
arXiv Detail & Related papers (2020-05-20T08:15:23Z) - Understanding and Mitigating the Tradeoff Between Robustness and
Accuracy [88.51943635427709]
Adversarial training augments the training set with perturbations to improve the robust error.
We show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor.
arXiv Detail & Related papers (2020-02-25T08:03:01Z) - Corruption-robust exploration in episodic reinforcement learning [76.19192549843727]
We study multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system.
Our framework yields efficient algorithms which attain near-optimal regret in the absence of corruptions.
Notably, our work provides the first sublinear regret guarantee which any deviation from purely i.i.d. transitions in the bandit-feedback model for episodic reinforcement learning.
arXiv Detail & Related papers (2019-11-20T03:49:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.