Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
Flat Regions in the Landscape Geometry
- URL: http://arxiv.org/abs/2202.03038v1
- Date: Mon, 7 Feb 2022 09:57:54 GMT
- Title: Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
Flat Regions in the Landscape Geometry
- Authors: Fabrizio Pittorino, Antonio Ferraro, Gabriele Perugini, Christoph
Feinauer, Carlo Baldassi, Riccardo Zecchina
- Abstract summary: We develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology.
We derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them.
We also find that minimizers found by variants of gradient descent can be connected by zero-error paths with a single bend.
- Score: 3.712728573432119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We systematize the approach to the investigation of deep neural network
landscapes by basing it on the geometry of the space of implemented functions
rather than the space of parameters. Grouping classifiers into equivalence
classes, we develop a standardized parameterization in which all symmetries are
removed, resulting in a toroidal topology. On this space, we explore the error
landscape rather than the loss. This lets us derive a meaningful notion of the
flatness of minimizers and of the geodesic paths connecting them. Using
different optimization algorithms that sample minimizers with different
flatness we study the mode connectivity and other characteristics. Testing a
variety of state-of-the-art architectures and benchmark datasets, we confirm
the correlation between flatness and generalization performance; we further
show that in function space flatter minima are closer to each other and that
the barriers along the geodesics connecting them are small. We also find that
minimizers found by variants of gradient descent can be connected by zero-error
paths with a single bend. We observe similar qualitative results in neural
networks with binary weights and activations, providing one of the first
results concerning the connectivity in this setting. Our results hinge on
symmetry removal, and are in remarkable agreement with the rich phenomenology
described by some recent analytical studies performed on simple shallow models.
Related papers
- The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures.
We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries.
Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z) - A simple connection from loss flatness to compressed representations in neural networks [3.5502600490147196]
We show that in the final phase of learning in deep neural networks, the compression of the manifold of neural representations correlates with the flatness of the loss around the minima explored by SGD.
Our work builds upon the linear stability insight by Ma and Ying, deriving inequalities between various compression metrics and quantities involving sharpness.
arXiv Detail & Related papers (2023-10-03T03:36:29Z) - Typical and atypical solutions in non-convex neural networks with
discrete and continuous weights [2.7127628066830414]
We study the binary and continuous negative-margin perceptrons as simple non-constrained network models learning random rules and associations.
Both models exhibit subdominant minimizers which are extremely flat and wide.
For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers.
arXiv Detail & Related papers (2023-04-26T23:34:40Z) - Symmetries, flat minima, and the conserved quantities of gradient flow [20.12938444246729]
We present a framework for finding continuous symmetries in the parameter space, which carves out low-loss valleys.
To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries.
arXiv Detail & Related papers (2022-10-31T10:55:30Z) - Annihilation of Spurious Minima in Two-Layer ReLU Networks [9.695960412426672]
We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss.
We show that adding neurons can turn symmetric spurious minima into saddles.
We also prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function.
arXiv Detail & Related papers (2022-10-12T11:04:21Z) - Minimal Neural Atlas: Parameterizing Complex Surfaces with Minimal
Charts and Distortion [71.52576837870166]
We present Minimal Neural Atlas, a novel atlas-based explicit neural surface representation.
At its core is a fully learnable parametric domain, given by an implicit probabilistic occupancy field defined on an open square of the parametric space.
Our reconstructions are more accurate in terms of the overall geometry, due to the separation of concerns on topology and geometry.
arXiv Detail & Related papers (2022-07-29T16:55:06Z) - Linear Connectivity Reveals Generalization Strategies [54.947772002394736]
Some pairs of finetuned models have large barriers of increasing loss on the linear paths between them.
We find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster.
Our work demonstrates how the geometry of the loss surface can guide models towards different functions.
arXiv Detail & Related papers (2022-05-24T23:43:02Z) - FuNNscope: Visual microscope for interactively exploring the loss
landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks.
We generalize observations on small neural networks to more complex systems.
An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant.
We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
arXiv Detail & Related papers (2020-09-05T02:25:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.