Spherical Motion Dynamics: Learning Dynamics of Neural Network with
Normalization, Weight Decay, and SGD
- URL: http://arxiv.org/abs/2006.08419v4
- Date: Fri, 27 Nov 2020 06:10:50 GMT
- Title: Spherical Motion Dynamics: Learning Dynamics of Neural Network with
Normalization, Weight Decay, and SGD
- Authors: Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun
- Abstract summary: We show the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum) named as Spherical Motion Dynamics (SMD)
We verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings.
- Score: 105.99301967452334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we comprehensively reveal the learning dynamics of neural
network with normalization, weight decay (WD), and SGD (with momentum), named
as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on
"effective learning rate" in "equilibrium" condition, where weight norm remains
unchanged. However, their discussions on why equilibrium condition can be
reached in SMD is either absent or less convincing. Our work investigates SMD
by directly exploring the cause of equilibrium condition. Specifically, 1) we
introduce the assumptions that can lead to equilibrium condition in SMD, and
prove that weight norm can converge at linear rate with given assumptions; 2)
we propose "angular update" as a substitute for effective learning rate to
measure the evolving of neural network in SMD, and prove angular update can
also converge to its theoretical value at linear rate; 3) we verify our
assumptions and theoretical results on various computer vision tasks including
ImageNet and MSCOCO with standard settings. Experiment results show our
theoretical findings agree well with empirical observations.
Related papers
- Towards a theory of learning dynamics in deep state space models [12.262490032020832]
State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks.
This work is a step toward a theory of learning dynamics in deep state space models.
arXiv Detail & Related papers (2024-07-10T00:01:56Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks [33.88586668321127]
This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks.
We show that explicitly controlling the rotation provides the benefits of weight decay while substantially reducing the need for learning rate warmup.
arXiv Detail & Related papers (2023-05-26T19:14:01Z) - Learning Neural Constitutive Laws From Motion Observations for
Generalizable PDE Dynamics [97.38308257547186]
Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and material models.
We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned.
We introduce a new framework termed "Neural Constitutive Laws" (NCLaw) which utilizes a network architecture that strictly guarantees standard priors.
arXiv Detail & Related papers (2023-04-27T17:42:24Z) - Learning Physical Dynamics with Subequivariant Graph Neural Networks [99.41677381754678]
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics.
Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization.
Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall.
arXiv Detail & Related papers (2022-10-13T10:00:30Z) - To update or not to update? Neurons at equilibrium in deep models [8.72305226979945]
Recent advances in deep learning showed that, with some a-posteriori information on fully-trained models, it is possible to match the same performance by simply training a subset of their parameters.
In this work we shift our focus from the single parameters to the behavior of the whole neuron, exploiting the concept of neuronal equilibrium (NEq)
The proposed approach has been tested on different state-of-the-art learning strategies and tasks, validating NEq and observing that the neuronal equilibrium depends on the specific learning setup.
arXiv Detail & Related papers (2022-07-19T08:07:53Z) - Equilibrium Propagation with Continual Weight Updates [69.87491240509485]
We propose a learning algorithm that bridges Machine Learning and Neuroscience, by computing gradients closely matching those of Backpropagation Through Time (BPTT)
We prove theoretically that, provided the learning rates are sufficiently small, at each time step of the second phase the dynamics of neurons and synapses follow the gradients of the loss given by BPTT.
These results bring EP a step closer to biology by better complying with hardware constraints while maintaining its intimate link with backpropagation.
arXiv Detail & Related papers (2020-04-29T14:54:30Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Incorporating Symmetry into Deep Dynamics Models for Improved
Generalization [24.363954435050264]
We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks.
Our models are theoretically and experimentally robust to distributional shift by symmetry group transformations.
Compared with image or text applications, our work is a significant step towards applying equivariant neural networks to high-dimensional systems.
arXiv Detail & Related papers (2020-02-08T01:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.