A Regret-Variance Trade-Off in Online Learning
- URL: http://arxiv.org/abs/2206.02656v1
- Date: Mon, 6 Jun 2022 14:50:19 GMT
- Title: A Regret-Variance Trade-Off in Online Learning
- Authors: Dirk van der Hoeven, Nikita Zhivotovskiy, Nicol\`o Cesa-Bianchi
- Abstract summary: We show how variance of predictions can be exploited in learning.
In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance.
We extend our results to the setting of online linear regression.
- Score: 14.41667013062914
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider prediction with expert advice for strongly convex and bounded
losses, and investigate trade-offs between regret and "variance" (i.e., squared
difference of learner's predictions and best expert predictions). With $K$
experts, the Exponentially Weighted Average (EWA) algorithm is known to achieve
$O(\log K)$ regret. We prove that a variant of EWA either achieves a negative
regret (i.e., the algorithm outperforms the best expert), or guarantees a
$O(\log K)$ bound on both variance and regret. Building on this result, we show
several examples of how variance of predictions can be exploited in learning.
In the online to batch analysis, we show that a large empirical variance allows
to stop the online to batch conversion early and outperform the risk of the
best predictor in the class. We also recover the optimal rate of model
selection aggregation when we do not consider early stopping. In online
prediction with corrupted losses, we show that the effect of corruption on the
regret can be compensated by a large variance. In online selective sampling, we
design an algorithm that samples less when the variance is large, while
guaranteeing the optimal regret bound in expectation. In online learning with
abstention, we use a similar term as the variance to derive the first
high-probability $O(\log K)$ regret bound in this setting. Finally, we extend
our results to the setting of online linear regression.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - No-Regret Learning with Unbounded Losses: The Case of Logarithmic
Pooling [12.933990572597583]
We focus on the fundamental and practical aggregation method known as logarithmic pooling.
We consider the problem of learning the best set of parameters in an online adversarial setting.
We present an algorithm that learns expert weights in a way that attains $O(sqrtT log T)$ expected regret.
arXiv Detail & Related papers (2022-02-22T22:27:25Z) - Can Q-Learning be Improved with Advice? [27.24260290748049]
This paper addresses the question of whether worst-case lower bounds for regret can be circumvented in online learning of Markov decision processes (MDPs)
We show that when predictions about the optimal $Q$-value function satisfy a reasonably weak condition we call distillation, then we can improve regret bounds by replacing the set of state-action pairs with the set of state-action pairs on which the predictions are grossly inaccurate.
Our work extends a recent line of work on algorithms with predictions, which has typically focused on simple online problems such as caching and scheduling, to the more complex and general problem of reinforcement learning.
arXiv Detail & Related papers (2021-10-25T15:44:20Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Online Multivalid Learning: Means, Moments, and Prediction Intervals [16.75129633574157]
We present a technique for providing contextual predictions that are "multivalid" in various senses.
The resulting estimates correctly predict various statistics of the labels $y$ not just marginally.
Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods.
arXiv Detail & Related papers (2021-01-05T19:08:11Z) - Leveraging Predictions in Smoothed Online Convex Optimization via
Gradient-based Algorithms [18.64335888217192]
We consider online convex optimization with time-varying stage costs and additional switching costs.
Since the switching costs introduce coupling across all stages, long-term predictions tend to suffer from lower quality.
We introduce a gradient-based online algorithm, Receding Horizon Inexact Gradient (RHIG), and analyze its performance by dynamic regrets.
arXiv Detail & Related papers (2020-11-25T06:25:51Z) - Almost Tight L0-norm Certified Robustness of Top-k Predictions against
Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches.
Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input.
For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z) - Malicious Experts versus the multiplicative weights algorithm in online
prediction [85.62472761361107]
We consider a prediction problem with two experts and a forecaster.
We assume that one of the experts is honest and makes correct prediction with probability $mu$ at each round.
The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster.
arXiv Detail & Related papers (2020-03-18T20:12:08Z) - Debiased Off-Policy Evaluation for Recommendation Systems [8.63711086812655]
A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure.
We develop an alternative method, which predicts the performance of algorithms given historical data.
Our method produces smaller mean squared errors than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-20T02:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.