Toward Equation of Motion for Deep Neural Networks: Continuous-time
Gradient Descent and Discretization Error Analysis
- URL: http://arxiv.org/abs/2210.15898v1
- Date: Fri, 28 Oct 2022 05:13:50 GMT
- Title: Toward Equation of Motion for Deep Neural Networks: Continuous-time
Gradient Descent and Discretization Error Analysis
- Authors: Taiki Miyagawa
- Abstract summary: We derive and solve an Equation of Motion'' (EoM) for deep neural networks (DNNs)
EoM is a continuous differential equation that precisely describes the discrete learning dynamics of GD.
- Score: 5.71097144710995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We derive and solve an ``Equation of Motion'' (EoM) for deep neural networks
(DNNs), a differential equation that precisely describes the discrete learning
dynamics of DNNs. Differential equations are continuous but have played a
prominent role even in the study of discrete optimization (gradient descent
(GD) algorithms). However, there still exist gaps between differential
equations and the actual learning dynamics of DNNs due to discretization error.
In this paper, we start from gradient flow (GF) and derive a counter term that
cancels the discretization error between GF and GD. As a result, we obtain EoM,
a continuous differential equation that precisely describes the discrete
learning dynamics of GD. We also derive discretization error to show to what
extent EoM is precise. In addition, we apply EoM to two specific cases: scale-
and translation-invariant layers. EoM highlights differences between
continuous-time and discrete-time GD, indicating the importance of the counter
term for a better description of the discrete learning dynamics of GD. Our
experimental results support our theoretical findings.
Related papers
- SEGNO: Generalizing Equivariant Graph Neural Networks with Physical
Inductive Biases [66.61789780666727]
We show how the second-order continuity can be incorporated into GNNs while maintaining the equivariant property.
We also offer theoretical insights into SEGNO, highlighting that it can learn a unique trajectory between adjacent states.
Our model yields a significant improvement over the state-of-the-art baselines.
arXiv Detail & Related papers (2023-08-25T07:15:58Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - Continuous Depth Recurrent Neural Differential Equations [0.0]
We propose continuous depth recurrent neural differential equations (CDR-NDE) to generalize RNN models.
CDR-NDE considers two separate differential equations over each of these dimensions and models the evolution in the temporal and depth directions.
We also propose the CDR-NDE-heat model based on partial differential equations which treats the computation of hidden states as solving a heat equation over time.
arXiv Detail & Related papers (2022-12-28T06:34:32Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - Linearization and Identification of Multiple-Attractors Dynamical System
through Laplacian Eigenmaps [8.161497377142584]
We propose a Graph-based spectral clustering method that takes advantage of a velocity-augmented kernel to connect data-points belonging to the same dynamics.
We prove that there always exist a set of 2-dimensional embedding spaces in which the sub-dynamics are linear, and n-dimensional embedding where they are quasi-linear.
We learn a diffeomorphism from the Laplacian embedding space to the original space and show that the Laplacian embedding leads to good reconstruction accuracy and a faster training time.
arXiv Detail & Related papers (2022-02-18T12:43:25Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - Incorporating NODE with Pre-trained Neural Differential Operator for
Learning Dynamics [73.77459272878025]
We propose to enhance the supervised signal in learning dynamics by pre-training a neural differential operator (NDO)
NDO is pre-trained on a class of symbolic functions, and it learns the mapping between the trajectory samples of these functions to their derivatives.
We provide theoretical guarantee on that the output of NDO can well approximate the ground truth derivatives by proper tuning the complexity of the library.
arXiv Detail & Related papers (2021-06-08T08:04:47Z) - Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
Descent [3.0079490585515343]
gradient descent (SGD) is relatively well understood in the vanishing learning rate regime.
We propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime.
arXiv Detail & Related papers (2020-12-07T12:31:43Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.