Understanding Mode Connectivity via Parameter Space Symmetry
- URL: http://arxiv.org/abs/2505.23681v1
- Date: Thu, 29 May 2025 17:20:54 GMT
- Title: Understanding Mode Connectivity via Parameter Space Symmetry
- Authors: Bo Zhao, Nima Dehmamy, Robin Walters, Rose Yu,
- Abstract summary: Neural network minima are often connected by curves along which train and test loss remain nearly constant.<n>We propose a new approach to exploring the connectedness of minima using parameter space symmetry.
- Score: 33.150665036826624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural network minima are often connected by curves along which train and test loss remain nearly constant, a phenomenon known as mode connectivity. While this property has enabled applications such as model merging and fine-tuning, its theoretical explanation remains unclear. We propose a new approach to exploring the connectedness of minima using parameter space symmetry. By linking the topology of symmetry groups to that of the minima, we derive the number of connected components of the minima of linear networks and show that skip connections reduce this number. We then examine when mode connectivity and linear mode connectivity hold or fail, using parameter symmetries which account for a significant part of the minimum. Finally, we provide explicit expressions for connecting curves in the minima induced by symmetry. Using the curvature of these curves, we derive conditions under which linear mode connectivity approximately holds. Our findings highlight the role of continuous symmetries in understanding the neural network loss landscape.
Related papers
- Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z) - Landscaping Linear Mode Connectivity [76.39694196535996]
linear mode connectivity (LMC) has garnered interest from both theoretical and practical fronts.
We take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC.
arXiv Detail & Related papers (2024-06-24T03:53:30Z) - The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures.
We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries.
Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z) - Geodesic Mode Connectivity [4.096453902709292]
Mode connectivity is a phenomenon where trained models are connected by a path of low loss.
We propose an algorithm to approximate geodesics and demonstrate that they achieve mode connectivity.
arXiv Detail & Related papers (2023-08-24T09:18:43Z) - Symmetries, flat minima, and the conserved quantities of gradient flow [20.12938444246729]
We present a framework for finding continuous symmetries in the parameter space, which carves out low-loss valleys.
To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries.
arXiv Detail & Related papers (2022-10-31T10:55:30Z) - Annihilation of Spurious Minima in Two-Layer ReLU Networks [9.695960412426672]
We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss.
We show that adding neurons can turn symmetric spurious minima into saddles.
We also prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function.
arXiv Detail & Related papers (2022-10-12T11:04:21Z) - Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
Flat Regions in the Landscape Geometry [3.712728573432119]
We develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology.
We derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them.
We also find that minimizers found by variants of gradient descent can be connected by zero-error paths with a single bend.
arXiv Detail & Related papers (2022-02-07T09:57:54Z) - Boundary theories of critical matchgate tensor networks [59.433172590351234]
Key aspects of the AdS/CFT correspondence can be captured in terms of tensor network models on hyperbolic lattices.
For tensors fulfilling the matchgate constraint, these have previously been shown to produce disordered boundary states.
We show that these Hamiltonians exhibit multi-scale quasiperiodic symmetries captured by an analytical toy model.
arXiv Detail & Related papers (2021-10-06T18:00:03Z) - Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant.
We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
arXiv Detail & Related papers (2020-09-05T02:25:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.