Gradient Equilibrium in Online Learning: Theory and Applications
- URL: http://arxiv.org/abs/2501.08330v3
- Date: Tue, 18 Feb 2025 16:39:54 GMT
- Title: Gradient Equilibrium in Online Learning: Theory and Applications
- Authors: Anastasios N. Angelopoulos, Michael I. Jordan, Ryan J. Tibshirani,
- Abstract summary: gradient equilibrium is achieved by standard online learning methods.
gradient equilibrium translates into an interpretable and meaningful property in online prediction problems.
We show that gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions.
- Score: 56.02856551198923
- License:
- Abstract: We present a new perspective on online learning that we refer to as gradient equilibrium: a sequence of iterates achieves gradient equilibrium if the average of gradients of losses along the sequence converges to zero. In general, this condition is not implied by, nor implies, sublinear regret. It turns out that gradient equilibrium is achievable by standard online learning methods such as gradient descent and mirror descent with constant step sizes (rather than decaying step sizes, as is usually required for no regret). Further, as we show through examples, gradient equilibrium translates into an interpretable and meaningful property in online prediction problems spanning regression, classification, quantile estimation, and others. Notably, we show that the gradient equilibrium framework can be used to develop a debiasing scheme for black-box predictions under arbitrary distribution shift, based on simple post hoc online descent updates. We also show that post hoc gradient updates can be used to calibrate predicted quantiles under distribution shift, and that the framework leads to unbiased Elo scores for pairwise preference prediction.
Related papers
- Parallel Momentum Methods Under Biased Gradient Estimations [11.074080383657453]
Parallel gradient methods are gaining prominence in solving large-scale machine learning problems that involve data distributed across multiple nodes.
However, obtaining unbiased bounds, which have been the focus of most theoretical research, is challenging in many machine learning applications.
In this paper we work out the implications for special gradient where estimates are biased, i.e. in meta-learning and when gradients are compressed or clipped.
arXiv Detail & Related papers (2024-02-29T18:03:03Z) - Estimator Meets Equilibrium Perspective: A Rectified Straight Through
Estimator for Binary Neural Networks Training [35.090598013305275]
Binarization of neural networks is a dominant paradigm in neural networks compression.
We propose Rectified Straight Through Estimator (ReSTE) to balance the estimating error and the gradient stability.
ReSTE has excellent performance and surpasses the state-of-the-art methods without any auxiliary modules or losses.
arXiv Detail & Related papers (2023-08-13T05:38:47Z) - The Implicit Bias of Batch Normalization in Linear Models and Two-layer
Linear Convolutional Neural Networks [117.93273337740442]
We show that gradient descent converges to a uniform margin classifier on the training data with an $exp(-Omega(log2 t))$ convergence rate.
We also show that batch normalization has an implicit bias towards a patch-wise uniform margin.
arXiv Detail & Related papers (2023-06-20T16:58:00Z) - The Equalization Losses: Gradient-Driven Training for Long-tailed Object
Recognition [84.51875325962061]
We propose a gradient-driven training mechanism to tackle the long-tail problem.
We introduce a new family of gradient-driven loss functions, namely equalization losses.
Our method consistently outperforms the baseline models.
arXiv Detail & Related papers (2022-10-11T16:00:36Z) - On the influence of roundoff errors on the convergence of the gradient
descent method with low-precision floating-point computation [0.0]
We propose a new rounding scheme that trades the zero bias property with a larger probability to preserve small gradients.
Our method yields constant rounding bias that, at each iteration, lies in a descent direction.
arXiv Detail & Related papers (2022-02-24T18:18:20Z) - Coupled Gradient Estimators for Discrete Latent Variables [41.428359609999326]
Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators.
We introduce a novel derivation of their estimator based on importance sampling and statistical couplings.
We show that our proposed categorical gradient estimators provide state-of-the-art performance.
arXiv Detail & Related papers (2021-06-15T11:28:44Z) - Implicit Gradient Regularization [18.391141066502644]
gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization.
We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization.
arXiv Detail & Related papers (2020-09-23T14:17:53Z) - Implicit Bias in Deep Linear Classification: Initialization Scale vs
Training Accuracy [71.25689267025244]
We show how the transition is controlled by the relationship between the scale and how accurately we minimize the training loss.
Our results indicate that some limit behaviors of gradient descent only kick in at ridiculous training accuracies.
arXiv Detail & Related papers (2020-07-13T23:49:53Z) - A Study of Gradient Variance in Deep Learning [56.437755740715396]
We introduce a method, Gradient Clustering, to minimize the variance of average mini-batch gradient with stratified sampling.
We measure the gradient variance on common deep learning benchmarks and observe that, contrary to common assumptions, gradient variance increases during training.
arXiv Detail & Related papers (2020-07-09T03:23:10Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.