Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
Dynamics
- URL: http://arxiv.org/abs/2012.04728v2
- Date: Mon, 29 Mar 2021 16:02:08 GMT
- Title: Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
Dynamics
- Authors: Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L.K.
Yamins, Hidenori Tanaka
- Abstract summary: Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning.
We show that any such symmetry imposes stringent geometric constraints on gradients and Hessians, leading to an associated conservation law.
We apply tools from finite difference methods to derive modified gradient flow, a differential equation that better approximates the numerical trajectory taken by SGD at finite learning rates.
- Score: 26.485269202381932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the dynamics of neural network parameters during training is
one of the key challenges in building a theoretical foundation for deep
learning. A central obstacle is that the motion of a network in
high-dimensional parameter space undergoes discrete finite steps along complex
stochastic gradients derived from real-world datasets. We circumvent this
obstacle through a unifying theoretical framework based on intrinsic symmetries
embedded in a network's architecture that are present for any dataset. We show
that any such symmetry imposes stringent geometric constraints on gradients and
Hessians, leading to an associated conservation law in the continuous-time
limit of stochastic gradient descent (SGD), akin to Noether's theorem in
physics. We further show that finite learning rates used in practice can
actually break these symmetry induced conservation laws. We apply tools from
finite difference methods to derive modified gradient flow, a differential
equation that better approximates the numerical trajectory taken by SGD at
finite learning rates. We combine modified gradient flow with our framework of
symmetries to derive exact integral expressions for the dynamics of certain
parameter combinations. We empirically validate our analytic expressions for
learning dynamics on VGG-16 trained on Tiny ImageNet. Overall, by exploiting
symmetry, our work demonstrates that we can analytically describe the learning
dynamics of various parameter combinations at finite learning rates and batch
sizes for state of the art architectures trained on any dataset.
Related papers
- Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - Oracle-Preserving Latent Flows [58.720142291102135]
We develop a methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset.
The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function.
The two new elements in this work are the use of a reduced-dimensionality latent space and the generalization to transformations invariant with respect to high-dimensional oracles.
arXiv Detail & Related papers (2023-02-02T00:13:32Z) - Designing Universal Causal Deep Learning Models: The Case of
Infinite-Dimensional Dynamical Systems from Stochastic Analysis [3.5450828190071655]
Causal operators (COs) play a central role in contemporary analysis.
There is still no canonical framework for designing Deep Learning (DL) models capable of approximating COs.
This paper proposes a "geometry-aware" solution to this open problem by introducing a DL model-design framework.
arXiv Detail & Related papers (2022-10-24T14:43:03Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Imitating Deep Learning Dynamics via Locally Elastic Stochastic
Differential Equations [20.066631203802302]
We study the evolution of features during deep learning training using a set of differential equations (SDEs) that each corresponds to a training sample.
Our results shed light on the decisive role of local elasticity in the training dynamics of neural networks.
arXiv Detail & Related papers (2021-10-11T17:17:20Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Incorporating Symmetry into Deep Dynamics Models for Improved
Generalization [24.363954435050264]
We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks.
Our models are theoretically and experimentally robust to distributional shift by symmetry group transformations.
Compared with image or text applications, our work is a significant step towards applying equivariant neural networks to high-dimensional systems.
arXiv Detail & Related papers (2020-02-08T01:28:17Z) - 'Place-cell' emergence and learning of invariant data with restricted
Boltzmann machines: breaking and dynamical restoration of continuous
symmetries in the weight space [0.0]
We study the learning dynamics of Restricted Boltzmann Machines (RBM), a neural network paradigm for representation learning.
As learning proceeds from a random configuration of the network weights, we show the existence of a symmetry-breaking phenomenon.
This symmetry-breaking phenomenon takes place only if the amount of data available for training exceeds some critical value.
arXiv Detail & Related papers (2019-12-30T14:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.