Related papers: A Regret-Variance Trade-Off in Online Learning

A Regret-Variance Trade-Off in Online Learning

URL: http://arxiv.org/abs/2206.02656v1
Date: Mon, 6 Jun 2022 14:50:19 GMT
Title: A Regret-Variance Trade-Off in Online Learning
Authors: Dirk van der Hoeven, Nikita Zhivotovskiy, Nicol\`o Cesa-Bianchi
Abstract summary: We show how variance of predictions can be exploited in learning. In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance. We extend our results to the setting of online linear regression.
Score: 14.41667013062914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider prediction with expert advice for strongly convex and bounded losses, and investigate trade-offs between regret and "variance" (i.e., squared difference of learner's predictions and best expert predictions). With $K$ experts, the Exponentially Weighted Average (EWA) algorithm is known to achieve $O(\log K)$ regret. We prove that a variant of EWA either achieves a negative regret (i.e., the algorithm outperforms the best expert), or guarantees a $O(\log K)$ bound on both variance and regret. Building on this result, we show several examples of how variance of predictions can be exploited in learning. In the online to batch analysis, we show that a large empirical variance allows to stop the online to batch conversion early and outperform the risk of the best predictor in the class. We also recover the optimal rate of model selection aggregation when we do not consider early stopping. In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance. In online selective sampling, we design an algorithm that samples less when the variance is large, while guaranteeing the optimal regret bound in expectation. In online learning with abstention, we use a similar term as the variance to derive the first high-probability $O(\log K)$ regret bound in this setting. Finally, we extend our results to the setting of online linear regression.

Related papers

Decision from Suboptimal Classifiers: Excess Risk Pre- and Post-Calibration [52.70324949884702]
We quantify the excess risk incurred using approximate posterior probabilities in batch binary decision-making. We identify regimes where recalibration alone addresses most of the regret, and regimes where the regret is dominated by the grouping loss. On NLP experiments, we show that these quantities identify when the expected gain of more advanced post-training is worth the operational cost.
arXiv Detail & Related papers (2025-03-23T10:52:36Z)
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces [54.37047702755926]
We develop an analysis of Thompson sampling for online learning under full feedback. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret.
arXiv Detail & Related papers (2025-02-20T18:10:12Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling [12.933990572597583]
We focus on the fundamental and practical aggregation method known as logarithmic pooling. We consider the problem of learning the best set of parameters in an online adversarial setting. We present an algorithm that learns expert weights in a way that attains $O(sqrtT log T)$ expected regret.
arXiv Detail & Related papers (2022-02-22T22:27:25Z)
Can Q-Learning be Improved with Advice? [27.24260290748049]
This paper addresses the question of whether worst-case lower bounds for regret can be circumvented in online learning of Markov decision processes (MDPs) We show that when predictions about the optimal $Q$-value function satisfy a reasonably weak condition we call distillation, then we can improve regret bounds by replacing the set of state-action pairs with the set of state-action pairs on which the predictions are grossly inaccurate. Our work extends a recent line of work on algorithms with predictions, which has typically focused on simple online problems such as caching and scheduling, to the more complex and general problem of reinforcement learning.
arXiv Detail & Related papers (2021-10-25T15:44:20Z)
Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss. We examine how these benign overfitting phenomena occur in a two-layer neural network setting. We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z)
Online Multivalid Learning: Means, Moments, and Prediction Intervals [16.75129633574157]
We present a technique for providing contextual predictions that are "multivalid" in various senses. The resulting estimates correctly predict various statistics of the labels $y$ not just marginally. Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods.
arXiv Detail & Related papers (2021-01-05T19:08:11Z)
Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms [18.64335888217192]
We consider online convex optimization with time-varying stage costs and additional switching costs. Since the switching costs introduce coupling across all stages, long-term predictions tend to suffer from lower quality. We introduce a gradient-based online algorithm, Receding Horizon Inexact Gradient (RHIG), and analyze its performance by dynamic regrets.
arXiv Detail & Related papers (2020-11-25T06:25:51Z)
Almost Tight L0-norm Certified Robustness of Top-k Predictions against Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches. Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input. For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z)
Malicious Experts versus the multiplicative weights algorithm in online prediction [85.62472761361107]
We consider a prediction problem with two experts and a forecaster. We assume that one of the experts is honest and makes correct prediction with probability $mu$ at each round. The other one is malicious, who knows true outcomes at each round and makes predictions in order to maximize the loss of the forecaster.
arXiv Detail & Related papers (2020-03-18T20:12:08Z)
Debiased Off-Policy Evaluation for Recommendation Systems [8.63711086812655]
A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure. We develop an alternative method, which predicts the performance of algorithms given historical data. Our method produces smaller mean squared errors than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-20T02:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.