Related papers: The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

URL: http://arxiv.org/abs/2405.20231v3
Date: Tue, 15 Oct 2024 12:53:48 GMT
Title: The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
Authors: Derek Lim, Theo Moe Putterman, Robin Walters, Haggai Maron, Stefanie Jegelka,
Abstract summary: We investigate the impact of neural parameter symmetries by introducing new neural network architectures. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
Score: 50.49582712378289
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many algorithms and observed phenomena in deep learning appear to be affected by parameter symmetries -- transformations of neural network parameters that do not change the underlying neural network function. These include linear mode connectivity, model merging, Bayesian neural network inference, metanetworks, and several other characteristics of optimization or loss-landscapes. However, theoretical analysis of the relationship between parameter space symmetries and these phenomena is difficult. In this work, we empirically investigate the impact of neural parameter symmetries by introducing new neural network architectures that have reduced parameter space symmetries. We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries. With these new methods, we conduct a comprehensive experimental study consisting of multiple tasks aimed at assessing the effect of removing parameter symmetries. Our experiments reveal several interesting observations on the empirical impact of parameter symmetries; for instance, we observe linear mode connectivity between our networks without alignment of weight spaces, and we find that our networks allow for faster and more effective Bayesian neural network training. Our code is available at https://github.com/cptq/asymmetric-networks

Related papers

The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE [0.0]
We investigate the impact of reducing symmetries on the performance of deep ensembles and Mixture of Experts (MoE) Our results show that deep ensembles built on asymmetric neural networks achieve significantly better performance as ensemble size increases. Our experiments do not provide conclusive evidence on whether reducing symmetries affects both MoE and MoIE architectures.
arXiv Detail & Related papers (2025-02-24T18:16:23Z)
Learning Broken Symmetries with Approximate Invariance [1.0485739694839669]
In many cases, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data. Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry. We propose a learning model which balances the generality and performance of unconstrained networks with the rapid learning of constrained networks.
arXiv Detail & Related papers (2024-12-25T04:29:04Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE [2.002741592555996]
This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs.
arXiv Detail & Related papers (2024-02-04T06:11:54Z)
Hidden symmetries of ReLU networks [17.332539115959708]
In some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries.
arXiv Detail & Related papers (2023-06-09T18:07:06Z)
Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks. We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order. In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z)
Annihilation of Spurious Minima in Two-Layer ReLU Networks [9.695960412426672]
We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss. We show that adding neurons can turn symmetric spurious minima into saddles. We also prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function.
arXiv Detail & Related papers (2022-10-12T11:04:21Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Encoding Involutory Invariance in Neural Networks [1.6371837018687636]
In certain situations, Neural Networks (NN) are trained upon data that obey underlying physical symmetries. In this work, we explore a special kind of symmetry where functions are invariant with respect to involutory linear/affine transformations up to parity. Numerical experiments indicate that the proposed models outperform baseline networks while respecting the imposed symmetry. An adaption of our technique to convolutional NN classification tasks for datasets with inherent horizontal/vertical reflection symmetry has also been proposed.
arXiv Detail & Related papers (2021-06-07T16:07:15Z)
Symmetry-via-Duality: Invariant Neural Network Densities from Parameter-Space Correlators [0.0]
symmetries of network densities may be determined via dual computations of network correlation functions. We demonstrate that the amount of symmetry in the initial density affects the accuracy of networks trained on Fashion-MNIST.
arXiv Detail & Related papers (2021-06-01T18:00:06Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.