Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
Dynamics
- URL: http://arxiv.org/abs/2012.04728v2
- Date: Mon, 29 Mar 2021 16:02:08 GMT
- Title: Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
Dynamics
- Authors: Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L.K.
Yamins, Hidenori Tanaka
- Abstract summary: Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning.
We show that any such symmetry imposes stringent geometric constraints on gradients and Hessians, leading to an associated conservation law.
We apply tools from finite difference methods to derive modified gradient flow, a differential equation that better approximates the numerical trajectory taken by SGD at finite learning rates.
- Score: 26.485269202381932
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the dynamics of neural network parameters during training is
one of the key challenges in building a theoretical foundation for deep
learning. A central obstacle is that the motion of a network in
high-dimensional parameter space undergoes discrete finite steps along complex
stochastic gradients derived from real-world datasets. We circumvent this
obstacle through a unifying theoretical framework based on intrinsic symmetries
embedded in a network's architecture that are present for any dataset. We show
that any such symmetry imposes stringent geometric constraints on gradients and
Hessians, leading to an associated conservation law in the continuous-time
limit of stochastic gradient descent (SGD), akin to Noether's theorem in
physics. We further show that finite learning rates used in practice can
actually break these symmetry induced conservation laws. We apply tools from
finite difference methods to derive modified gradient flow, a differential
equation that better approximates the numerical trajectory taken by SGD at
finite learning rates. We combine modified gradient flow with our framework of
symmetries to derive exact integral expressions for the dynamics of certain
parameter combinations. We empirically validate our analytic expressions for
learning dynamics on VGG-16 trained on Tiny ImageNet. Overall, by exploiting
symmetry, our work demonstrates that we can analytically describe the learning
dynamics of various parameter combinations at finite learning rates and batch
sizes for state of the art architectures trained on any dataset.
Related papers
- Optimal Equivariant Architectures from the Symmetries of Matrix-Element Likelihoods [0.0]
Matrix-Element Method (MEM) has long been a cornerstone of data analysis in high-energy physics.
geometric deep learning has enabled neural network architectures that incorporate known symmetries directly into their design.
This paper presents a novel approach that combines MEM-inspired symmetry considerations with equivariant neural network design for particle physics analysis.
arXiv Detail & Related papers (2024-10-24T08:56:37Z) - The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures.
We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries.
Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent [8.347295051171525]
We show that gradient noise creates a systematic interplay of parameters $theta$ along the degenerate direction to a unique-independent fixed point $theta*$.
These points are referred to as the it noise equilibria because, at these points, noise contributions from different directions are balanced and aligned.
We show that the balance and alignment of gradient noise can serve as a novel alternative mechanism for explaining important phenomena such as progressive sharpening/flattening and representation formation within neural networks.
arXiv Detail & Related papers (2024-02-11T13:00:04Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Oracle-Preserving Latent Flows [58.720142291102135]
We develop a methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset.
The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function.
The two new elements in this work are the use of a reduced-dimensionality latent space and the generalization to transformations invariant with respect to high-dimensional oracles.
arXiv Detail & Related papers (2023-02-02T00:13:32Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Imitating Deep Learning Dynamics via Locally Elastic Stochastic
Differential Equations [20.066631203802302]
We study the evolution of features during deep learning training using a set of differential equations (SDEs) that each corresponds to a training sample.
Our results shed light on the decisive role of local elasticity in the training dynamics of neural networks.
arXiv Detail & Related papers (2021-10-11T17:17:20Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Incorporating Symmetry into Deep Dynamics Models for Improved
Generalization [24.363954435050264]
We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks.
Our models are theoretically and experimentally robust to distributional shift by symmetry group transformations.
Compared with image or text applications, our work is a significant step towards applying equivariant neural networks to high-dimensional systems.
arXiv Detail & Related papers (2020-02-08T01:28:17Z) - 'Place-cell' emergence and learning of invariant data with restricted
Boltzmann machines: breaking and dynamical restoration of continuous
symmetries in the weight space [0.0]
We study the learning dynamics of Restricted Boltzmann Machines (RBM), a neural network paradigm for representation learning.
As learning proceeds from a random configuration of the network weights, we show the existence of a symmetry-breaking phenomenon.
This symmetry-breaking phenomenon takes place only if the amount of data available for training exceeds some critical value.
arXiv Detail & Related papers (2019-12-30T14:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.