Imitating Deep Learning Dynamics via Locally Elastic Stochastic
Differential Equations
- URL: http://arxiv.org/abs/2110.05960v1
- Date: Mon, 11 Oct 2021 17:17:20 GMT
- Title: Imitating Deep Learning Dynamics via Locally Elastic Stochastic
Differential Equations
- Authors: Jiayao Zhang, Hua Wang, Weijie J. Su
- Abstract summary: We study the evolution of features during deep learning training using a set of differential equations (SDEs) that each corresponds to a training sample.
Our results shed light on the decisive role of local elasticity in the training dynamics of neural networks.
- Score: 20.066631203802302
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding the training dynamics of deep learning models is perhaps a
necessary step toward demystifying the effectiveness of these models. In
particular, how do data from different classes gradually become separable in
their feature spaces when training neural networks using stochastic gradient
descent? In this study, we model the evolution of features during deep learning
training using a set of stochastic differential equations (SDEs) that each
corresponds to a training sample. As a crucial ingredient in our modeling
strategy, each SDE contains a drift term that reflects the impact of
backpropagation at an input on the features of all samples. Our main finding
uncovers a sharp phase transition phenomenon regarding the {intra-class impact:
if the SDEs are locally elastic in the sense that the impact is more
significant on samples from the same class as the input, the features of the
training data become linearly separable, meaning vanishing training loss;
otherwise, the features are not separable, regardless of how long the training
time is. Moreover, in the presence of local elasticity, an analysis of our SDEs
shows that the emergence of a simple geometric structure called the neural
collapse of the features. Taken together, our results shed light on the
decisive role of local elasticity in the training dynamics of neural networks.
We corroborate our theoretical analysis with experiments on a synthesized
dataset of geometric shapes and CIFAR-10.
Related papers
- Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit [1.7597525104451157]
An empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE)
Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs)
We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings.
arXiv Detail & Related papers (2024-06-11T03:07:41Z) - Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations [7.890817997914349]
Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering.
One advantage of the neural network methods for PDEs lies in its automatic differentiation (AD)
In this paper, we quantitatively demonstrate the advantage of AD in training neural networks.
arXiv Detail & Related papers (2024-05-23T02:01:05Z) - Tipping Points of Evolving Epidemiological Networks: Machine
Learning-Assisted, Data-Driven Effective Modeling [0.0]
We study the tipping point collective dynamics of an adaptive susceptible-infected (SIS) epidemiological network in a data-driven, machine learning-assisted manner.
We identify a complex effective differential equation (eSDE) in terms physically meaningful coarse mean-field variables.
We study the statistics of rare events both through repeated brute force simulations and by using established mathematical/computational tools.
arXiv Detail & Related papers (2023-11-01T19:33:03Z) - Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training.
We then fit a hidden Markov model (HMM) over the resulting sequences of metrics.
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Capturing Actionable Dynamics with Structured Latent Ordinary
Differential Equations [68.62843292346813]
We propose a structured latent ODE model that captures system input variations within its latent representation.
Building on a static variable specification, our model learns factors of variation for each input to the system, thus separating the effects of the system inputs in the latent space.
arXiv Detail & Related papers (2022-02-25T20:00:56Z) - Stochastic Physics-Informed Neural Networks (SPINN): A Moment-Matching
Framework for Learning Hidden Physics within Stochastic Differential
Equations [4.482886054198202]
We propose a framework for training deep neural networks to learn equations that represent hidden physics within differential equations (SDEs)
The proposed framework relies on uncertainty propagation and moment-matching techniques along with state-of-the-art deep learning strategies.
arXiv Detail & Related papers (2021-09-03T16:59:12Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Supervised Learning in the Presence of Concept Drift: A modelling
framework [5.22609266390809]
We present a modelling framework for the investigation of supervised learning in non-stationary environments.
We model two example types of learning systems: prototype-based Learning Vector Quantization (LVQ) for classification and shallow, layered neural networks for regression tasks.
arXiv Detail & Related papers (2020-05-21T09:13:58Z) - Stochasticity in Neural ODEs: An Empirical Study [68.8204255655161]
Regularization of neural networks (e.g. dropout) is a widespread technique in deep learning that allows for better generalization.
We show that data augmentation during the training improves the performance of both deterministic and versions of the same model.
However, the improvements obtained by the data augmentation completely eliminate the empirical regularization gains, making the performance of neural ODE and neural SDE negligible.
arXiv Detail & Related papers (2020-02-22T22:12:56Z) - Learning Stochastic Behaviour from Aggregate Data [52.012857267317784]
Learning nonlinear dynamics from aggregate data is a challenging problem because the full trajectory of each individual is not available.
We propose a novel method using the weak form of Fokker Planck Equation (FPE) to describe the density evolution of data in a sampled form.
In such a sample-based framework we are able to learn the nonlinear dynamics from aggregate data without explicitly solving the partial differential equation (PDE) FPE.
arXiv Detail & Related papers (2020-02-10T03:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.