Training Scale-Invariant Neural Networks on the Sphere Can Happen in
Three Regimes
- URL: http://arxiv.org/abs/2209.03695v1
- Date: Thu, 8 Sep 2022 10:30:05 GMT
- Title: Training Scale-Invariant Neural Networks on the Sphere Can Happen in
Three Regimes
- Authors: Maxim Kodryan, Ekaterina Lobacheva, Maksim Nakhodnov, Dmitry Vetrov
- Abstract summary: We study the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR.
We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence.
- Score: 3.808063547958558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental property of deep learning normalization techniques, such as
batch normalization, is making the pre-normalization parameters scale
invariant. The intrinsic domain of such parameters is the unit sphere, and
therefore their gradient optimization dynamics can be represented via spherical
optimization with varying effective learning rate (ELR), which was studied
previously. In this work, we investigate the properties of training
scale-invariant neural networks directly on the sphere using a fixed ELR. We
discover three regimes of such training depending on the ELR value:
convergence, chaotic equilibrium, and divergence. We study these regimes in
detail both on a theoretical examination of a toy example and on a thorough
empirical analysis of real scale-invariant deep learning models. Each regime
has unique features and reflects specific properties of the intrinsic loss
landscape, some of which have strong parallels with previous research on both
regular and scale-invariant neural networks training. Finally, we demonstrate
how the discovered regimes are reflected in conventional training of normalized
networks and how they can be leveraged to achieve better optima.
Related papers
- Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters.
We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Equivariance and generalization in neural networks [0.0]
We focus on the consequences of incorporating translational equivariance among the network properties.
The benefits of equivariant networks are exemplified by studying a complex scalar field theory.
In most of the tasks our best equivariant architectures can perform and generalize significantly better than their non-equivariant counterparts.
arXiv Detail & Related papers (2021-12-23T12:38:32Z) - Training Integrable Parameterizations of Deep Neural Networks in the
Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks.
For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights.
We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Spherical Perspective on Learning with Normalization Layers [28.10737477667422]
Normalization Layers (NLs) are widely used in modern deep-learning architectures.
This paper introduces a spherical framework to study the optimization of neural networks with NLs from a geometric perspective.
arXiv Detail & Related papers (2020-06-23T23:29:51Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Incorporating Symmetry into Deep Dynamics Models for Improved
Generalization [24.363954435050264]
We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks.
Our models are theoretically and experimentally robust to distributional shift by symmetry group transformations.
Compared with image or text applications, our work is a significant step towards applying equivariant neural networks to high-dimensional systems.
arXiv Detail & Related papers (2020-02-08T01:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.