Label noise (stochastic) gradient descent implicitly solves the Lasso
for quadratic parametrisation
- URL: http://arxiv.org/abs/2206.09841v1
- Date: Mon, 20 Jun 2022 15:24:42 GMT
- Title: Label noise (stochastic) gradient descent implicitly solves the Lasso
for quadratic parametrisation
- Authors: Loucas Pillaud-Vivien, Julien Reygner, Nicolas Flammarion
- Abstract summary: We study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version.
Our findings highlight the fact that structured noise can induce better generalisation and help explain the greater performances of dynamics as observed in practice.
- Score: 14.244787327283335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the implicit bias of training algorithms is of crucial
importance in order to explain the success of overparametrised neural networks.
In this paper, we study the role of the label noise in the training dynamics of
a quadratically parametrised model through its continuous time version. We
explicitly characterise the solution chosen by the stochastic flow and prove
that it implicitly solves a Lasso program. To fully complete our analysis, we
provide nonasymptotic convergence guarantees for the dynamics as well as
conditions for support recovery. We also give experimental results which
support our theoretical claims. Our findings highlight the fact that structured
noise can induce better generalisation and help explain the greater
performances of stochastic dynamics as observed in practice.
Related papers
- Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features [8.645858565518155]
We provide a formula for computing the difference between test risk curves of pure gradient and gradient flows.
We explicitly compute the corrections brought about by the added term in the dynamics.
The analytical results are compared to simulations of discrete-time gradient descent and show good agreement.
arXiv Detail & Related papers (2024-02-12T13:11:11Z) - Role of Momentum in Smoothing Objective Function and Generalizability of Deep Neural Networks [0.6906005491572401]
We show that noise in gradient descent (SGD) with momentum smoothes the objective function, the degree of which is determined by the learning rate, the batch size, the momentum factor, and the upper bound of the norm.
We also provide experimental results supporting our assertion model generalizability depends on the noise level.
arXiv Detail & Related papers (2024-02-04T02:48:28Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - A Free Lunch from the Noise: Provable and Practical Exploration for
Representation Learning [55.048010996144036]
We show that under some noise assumption, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free.
We propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise.
arXiv Detail & Related papers (2021-11-22T19:24:57Z) - Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of
Stochasticity [24.428843425522107]
We study the dynamics of gradient descent over diagonal linear networks through its continuous time version, namely gradient flow.
We show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias.
arXiv Detail & Related papers (2021-06-17T14:16:04Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Noisy Recurrent Neural Networks [45.94390701863504]
We study recurrent neural networks (RNNs) trained by injecting noise into hidden states as discretizations of differential equations driven by input data.
We find that, under reasonable assumptions, this implicit regularization promotes flatter minima; it biases towards models with more stable dynamics; and, in classification tasks, it favors models with larger classification margin.
Our theory is supported by empirical results which demonstrate improved robustness with respect to various input perturbations, while maintaining state-of-the-art performance.
arXiv Detail & Related papers (2021-02-09T15:20:50Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.