Related papers: Hidden symmetries of ReLU networks

Hidden symmetries of ReLU networks

URL: http://arxiv.org/abs/2306.06179v1
Date: Fri, 9 Jun 2023 18:07:06 GMT
Title: Hidden symmetries of ReLU networks
Authors: J. Elisenda Grigsby and Kathryn Lindsey and David Rolnick
Abstract summary: In some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries.
Score: 17.332539115959708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The parameter space for any fixed architecture of feedforward ReLU neural networks serves as a proxy during training for the associated class of functions - but how faithful is this representation? It is known that many different parameter settings can determine the same function. Moreover, the degree of this redundancy is inhomogeneous: for some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. We also describe a number of mechanisms through which hidden symmetries can arise, and empirically approximate the functional dimension of different network architectures at initialization. These experiments indicate that the probability that a network has no hidden symmetries decreases towards 0 as depth increases, while increasing towards 1 as width and input dimension increase.

Related papers

Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z)
Symmetry in Neural Network Parameter Spaces [32.732734207891745]
A significant portion of redundancy is explained by symmetries in the parameter space--transformations that leave the network function unchanged.<n>These symmetries shape the loss landscape and constrain learning dynamics, offering a new lens for understanding optimization, generalization, and model complexity.<n>We summarize existing literature, uncover connections between symmetry and learning theory, and identify gaps and opportunities in this emerging field.
arXiv Detail & Related papers (2025-06-16T00:59:12Z)
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Geometry-induced Implicit Regularization in Deep ReLU Neural Networks [0.0]
Implicit regularization phenomena, which are still not well understood, occur during optimization. We study the geometry of the output set as parameters vary. We prove that the batch functional dimension is almost surely determined by the activation patterns in the hidden layers.
arXiv Detail & Related papers (2024-02-13T07:49:57Z)
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit [48.291961660957384]
We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers exhibit transfer of optimal hyper parameters across width and depth. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit.
arXiv Detail & Related papers (2023-09-28T17:20:50Z)
Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks. We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z)
LieGG: Studying Learned Lie Group Generators [1.5293427903448025]
Symmetries built into a neural network have appeared to be very beneficial for a wide range of tasks as it saves the data to learn them. We present a method to extract symmetries learned by a neural network and to evaluate the degree to which a network is invariant to them.
arXiv Detail & Related papers (2022-10-09T20:42:37Z)
Encoding Involutory Invariance in Neural Networks [1.6371837018687636]
In certain situations, Neural Networks (NN) are trained upon data that obey underlying physical symmetries. In this work, we explore a special kind of symmetry where functions are invariant with respect to involutory linear/affine transformations up to parity. Numerical experiments indicate that the proposed models outperform baseline networks while respecting the imposed symmetry. An adaption of our technique to convolutional NN classification tasks for datasets with inherent horizontal/vertical reflection symmetry has also been proposed.
arXiv Detail & Related papers (2021-06-07T16:07:15Z)
Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators [0.0]
symmetries of network densities may be determined via dual computations of network correlation functions. We demonstrate that the amount of symmetry in the initial density affects the accuracy of networks trained on Fashion-MNIST.
arXiv Detail & Related papers (2021-06-01T18:00:06Z)
A Functional Perspective on Learning Symmetric Functions with Neural Networks [48.80300074254758]
We study the learning and representation of neural networks defined on measures. We establish approximation and generalization bounds under different choices of regularization. The resulting models can be learned efficiently and enjoy generalization guarantees that extend across input sizes.
arXiv Detail & Related papers (2020-08-16T16:34:33Z)
Neural Parameter Allocation Search [57.190693718951316]
Training neural networks requires increasing amounts of memory. Existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that fail to generalize. We introduce Neural Allocation Search (NPAS), a novel task where the goal is to train a neural network given an arbitrary, fixed parameter budget. NPAS covers both low-budget regimes, which produce compact networks, as well as a novel high-budget regime, where additional capacity can be added to boost performance without increasing inference FLOPs.
arXiv Detail & Related papers (2020-06-18T15:01:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.