Doubly Stochastic Models: Learning with Unbiased Label Noises and
Inference Stability
- URL: http://arxiv.org/abs/2304.00320v1
- Date: Sat, 1 Apr 2023 14:09:07 GMT
- Title: Doubly Stochastic Models: Learning with Unbiased Label Noises and
Inference Stability
- Authors: Haoyi Xiong, Xuhong Li, Boyang Yu, Zhanxing Zhu, Dongrui Wu, Dejing
Dou
- Abstract summary: We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent.
We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters.
Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
- Score: 85.1044381834036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random label noises (or observational noises) widely exist in practical
machine learning settings. While previous studies primarily focus on the
affects of label noises to the performance of learning, our work intends to
investigate the implicit regularization effects of the label noises, under
mini-batch sampling settings of stochastic gradient descent (SGD), with
assumptions that label noises are unbiased. Specifically, we analyze the
learning dynamics of SGD over the quadratic loss with unbiased label noises,
where we model the dynamics of SGD as a stochastic differentiable equation
(SDE) with two diffusion terms (namely a Doubly Stochastic Model). While the
first diffusion term is caused by mini-batch sampling over the
(label-noiseless) loss gradients as many other works on SGD, our model
investigates the second noise term of SGD dynamics, which is caused by
mini-batch sampling over the label noises, as an implicit regularizer. Our
theoretical analysis finds such implicit regularizer would favor some
convergence points that could stabilize model outputs against perturbation of
parameters (namely inference stability). Though similar phenomenon have been
investigated, our work doesn't assume SGD as an Ornstein-Uhlenbeck like process
and achieve a more generalizable result with convergence of approximation
proved. To validate our analysis, we design two sets of empirical studies to
analyze the implicit regularizer of SGD with unbiased random label noises for
deep neural networks training and linear regression.
Related papers
- Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets [23.4536532321199]
We propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
Inspired by our observations, we propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
arXiv Detail & Related papers (2022-07-12T11:35:55Z) - Computing the Variance of Shuffling Stochastic Gradient Algorithms via
Power Spectral Density Analysis [6.497816402045099]
Two common alternatives to gradient descent (SGD) with theoretical benefits are random reshuffling (SGDRR) and shuffle-once (SGD-SO)
We study the stationary variances of SGD, SGDRR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations.
arXiv Detail & Related papers (2022-06-01T17:08:04Z) - The effective noise of Stochastic Gradient Descent [9.645196221785694]
Gradient Descent (SGD) is the workhorse algorithm of deep learning technology.
We characterize the parameters of SGD and a recently-introduced variant, persistent SGD, in a neural network model.
We find that noisier algorithms lead to wider decision boundaries of the corresponding constraint satisfaction problem.
arXiv Detail & Related papers (2021-12-20T20:46:19Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - On Minibatch Noise: Discrete-Time SGD, Overparametrization, and Bayes [2.6763498831034043]
Noise in gradient descent (SGD) caused by minibatch sampling remains poorly understood.
Motivated by the observation that minibatch sampling does not always cause a fluctuation, we set out to find the conditions that cause minibatch noise to emerge.
arXiv Detail & Related papers (2021-02-10T10:38:55Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.