Related papers: Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

URL: http://arxiv.org/abs/2006.08419v4
Date: Fri, 27 Nov 2020 06:10:50 GMT
Title: Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD
Authors: Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun
Abstract summary: We show the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum) named as Spherical Motion Dynamics (SMD) We verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings.
Score: 105.99301967452334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD). Most related works study SMD by focusing on "effective learning rate" in "equilibrium" condition, where weight norm remains unchanged. However, their discussions on why equilibrium condition can be reached in SMD is either absent or less convincing. Our work investigates SMD by directly exploring the cause of equilibrium condition. Specifically, 1) we introduce the assumptions that can lead to equilibrium condition in SMD, and prove that weight norm can converge at linear rate with given assumptions; 2) we propose "angular update" as a substitute for effective learning rate to measure the evolving of neural network in SMD, and prove angular update can also converge to its theoretical value at linear rate; 3) we verify our assumptions and theoretical results on various computer vision tasks including ImageNet and MSCOCO with standard settings. Experiment results show our theoretical findings agree well with empirical observations.

Related papers

Allostatic Control of Persistent States in Spiking Neural Networks for perception and computation [79.16635054977068]
We introduce a novel model for updating perceptual beliefs about the environment by extending the concept of Allostasis to the control of internal representations. In this paper, we focus on an application in numerical cognition, where a bump of activity in an attractor network is used as a spatial numerical representation.
arXiv Detail & Related papers (2025-03-20T12:28:08Z)
Towards a theory of learning dynamics in deep state space models [12.262490032020832]
State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks. This work is a step toward a theory of learning dynamics in deep state space models.
arXiv Detail & Related papers (2024-07-10T00:01:56Z)
A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics [73.35846234413611]
In drug discovery, molecular dynamics (MD) simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. We propose NeuralMD, the first machine learning (ML) surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding dynamics. We demonstrate the efficiency and effectiveness of NeuralMD, achieving over 1K$times$ speedup compared to standard numerical MD simulations.
arXiv Detail & Related papers (2024-01-26T09:35:17Z)
Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium. We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z)
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks [33.88586668321127]
This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks. We show that explicitly controlling the rotation provides the benefits of weight decay while substantially reducing the need for learning rate warmup.
arXiv Detail & Related papers (2023-05-26T19:14:01Z)
Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics [97.38308257547186]
Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and material models. We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned. We introduce a new framework termed "Neural Constitutive Laws" (NCLaw) which utilizes a network architecture that strictly guarantees standard priors.
arXiv Detail & Related papers (2023-04-27T17:42:24Z)
Learning Physical Dynamics with Subequivariant Graph Neural Networks [99.41677381754678]
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics. Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization. Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall.
arXiv Detail & Related papers (2022-10-13T10:00:30Z)
To update or not to update? Neurons at equilibrium in deep models [8.72305226979945]
Recent advances in deep learning showed that, with some a-posteriori information on fully-trained models, it is possible to match the same performance by simply training a subset of their parameters. In this work we shift our focus from the single parameters to the behavior of the whole neuron, exploiting the concept of neuronal equilibrium (NEq) The proposed approach has been tested on different state-of-the-art learning strategies and tasks, validating NEq and observing that the neuronal equilibrium depends on the specific learning setup.
arXiv Detail & Related papers (2022-07-19T08:07:53Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Incorporating Symmetry into Deep Dynamics Models for Improved Generalization [24.363954435050264]
We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks. Our models are theoretically and experimentally robust to distributional shift by symmetry group transformations. Compared with image or text applications, our work is a significant step towards applying equivariant neural networks to high-dimensional systems.
arXiv Detail & Related papers (2020-02-08T01:28:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.