Symmetry in Neural Network Parameter Spaces
- URL: http://arxiv.org/abs/2506.13018v1
- Date: Mon, 16 Jun 2025 00:59:12 GMT
- Title: Symmetry in Neural Network Parameter Spaces
- Authors: Bo Zhao, Robin Walters, Rose Yu,
- Abstract summary: A significant portion of redundancy is explained by symmetries in the parameter space--transformations that leave the network function unchanged.<n>These symmetries shape the loss landscape and constrain learning dynamics, offering a new lens for understanding optimization, generalization, and model complexity.<n>We summarize existing literature, uncover connections between symmetry and learning theory, and identify gaps and opportunities in this emerging field.
- Score: 32.732734207891745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern deep learning models are highly overparameterized, resulting in large sets of parameter configurations that yield the same outputs. A significant portion of this redundancy is explained by symmetries in the parameter space--transformations that leave the network function unchanged. These symmetries shape the loss landscape and constrain learning dynamics, offering a new lens for understanding optimization, generalization, and model complexity that complements existing theory of deep learning. This survey provides an overview of parameter space symmetry. We summarize existing literature, uncover connections between symmetry and learning theory, and identify gaps and opportunities in this emerging field.
Related papers
- Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z) - Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models [55.46269953415811]
We identify ToM-sensitive parameters and show that perturbing as little as 0.001% of these parameters significantly degrades ToM performance.<n>Our results have implications for enhancing model alignment, mitigating biases, and improving AI systems designed for human interaction.
arXiv Detail & Related papers (2025-04-05T17:45:42Z) - Optimal Equivariant Architectures from the Symmetries of Matrix-Element Likelihoods [0.0]
Matrix-Element Method (MEM) has long been a cornerstone of data analysis in high-energy physics.
geometric deep learning has enabled neural network architectures that incorporate known symmetries directly into their design.
This paper presents a novel approach that combines MEM-inspired symmetry considerations with equivariant neural network design for particle physics analysis.
arXiv Detail & Related papers (2024-10-24T08:56:37Z) - The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures.
We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries.
Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z) - Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Unification of Symmetries Inside Neural Networks: Transformer,
Feedforward and Neural ODE [2.002741592555996]
This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures.
We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms.
Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs.
arXiv Detail & Related papers (2024-02-04T06:11:54Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Oracle-Preserving Latent Flows [58.720142291102135]
We develop a methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset.
The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function.
The two new elements in this work are the use of a reduced-dimensionality latent space and the generalization to transformations invariant with respect to high-dimensional oracles.
arXiv Detail & Related papers (2023-02-02T00:13:32Z) - A Geometric Modeling of Occam's Razor in Deep Learning [8.007631014276896]
deep neural networks (DNNs) benefit from very high dimensional parameter spaces.<n>Their huge parameter complexities vs stunning performance in practice is all the more intriguing and not explainable.<n>We propose a geometrically flavored information-theoretic approach to study this phenomenon.
arXiv Detail & Related papers (2019-05-27T07:57:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.