Related papers: Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes

URL: http://arxiv.org/abs/2209.03695v1
Date: Thu, 8 Sep 2022 10:30:05 GMT
Title: Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
Authors: Maxim Kodryan, Ekaterina Lobacheva, Maksim Nakhodnov, Dmitry Vetrov
Abstract summary: We study the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence.
Score: 3.808063547958558
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.

Related papers

Investigating generalization capabilities of neural networks by means of loss landscapes and Hessian analysis [0.0]
This paper studies generalization capabilities of neural networks (NNs) using new and improved PyTorch library Loss Landscape Analysis (LLA) LLA facilitates visualization and analysis of loss landscapes along with the properties of NN Hessian.
arXiv Detail & Related papers (2024-12-13T14:02:41Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters. We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z)
Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training. We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z)
Equivariance and generalization in neural networks [0.0]
We focus on the consequences of incorporating translational equivariance among the network properties. The benefits of equivariant networks are exemplified by studying a complex scalar field theory. In most of the tasks our best equivariant architectures can perform and generalize significantly better than their non-equivariant counterparts.
arXiv Detail & Related papers (2021-12-23T12:38:32Z)
Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights. We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z)
Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers. We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Spherical Perspective on Learning with Normalization Layers [28.10737477667422]
Normalization Layers (NLs) are widely used in modern deep-learning architectures. This paper introduces a spherical framework to study the optimization of neural networks with NLs from a geometric perspective.
arXiv Detail & Related papers (2020-06-23T23:29:51Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
Incorporating Symmetry into Deep Dynamics Models for Improved Generalization [24.363954435050264]
We propose to improve accuracy and generalization by incorporating symmetries into convolutional neural networks. Our models are theoretically and experimentally robust to distributional shift by symmetry group transformations. Compared with image or text applications, our work is a significant step towards applying equivariant neural networks to high-dimensional systems.
arXiv Detail & Related papers (2020-02-08T01:28:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.