Annihilation of Spurious Minima in Two-Layer ReLU Networks
- URL: http://arxiv.org/abs/2210.06088v1
- Date: Wed, 12 Oct 2022 11:04:21 GMT
- Title: Annihilation of Spurious Minima in Two-Layer ReLU Networks
- Authors: Yossi Arjevani, Michael Field
- Abstract summary: We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss.
We show that adding neurons can turn symmetric spurious minima into saddles.
We also prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function.
- Score: 9.695960412426672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the optimization problem associated with fitting two-layer ReLU
neural networks with respect to the squared loss, where labels are generated by
a target network. Use is made of the rich symmetry structure to develop a novel
set of tools for studying the mechanism by which over-parameterization
annihilates spurious minima. Sharp analytic estimates are obtained for the loss
and the Hessian spectrum at different minima, and it is proved that adding
neurons can turn symmetric spurious minima into saddles; minima of lesser
symmetry require more neurons. Using Cauchy's interlacing theorem, we prove the
existence of descent directions in certain subspaces arising from the symmetry
structure of the loss function. This analytic approach uses techniques, new to
the field, from algebraic geometry, representation theory and symmetry
breaking, and confirms rigorously the effectiveness of over-parameterization in
making the associated loss landscape accessible to gradient-based methods. For
a fixed number of neurons and inputs, the spectral results remain true under
symmetry breaking perturbation of the target.
Related papers
- The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof [50.49582712378289]
We investigate the impact of neural parameter symmetries by introducing new neural network architectures.
We develop two methods, with some provable guarantees, of modifying standard neural networks to reduce parameter space symmetries.
Our experiments reveal several interesting observations on the empirical impact of parameter symmetries.
arXiv Detail & Related papers (2024-05-30T16:32:31Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Lie Point Symmetry and Physics Informed Networks [59.56218517113066]
We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function.
Our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions.
Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.
arXiv Detail & Related papers (2023-11-07T19:07:16Z) - Symmetry Induces Structure and Constraint of Learning [0.0]
We unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models.
Common instances of mirror symmetries in deep learning include rescaling, rotation, and permutation symmetry.
We show that the theoretical framework can explain intriguing phenomena, such as the loss of plasticity and various collapse phenomena in neural networks.
arXiv Detail & Related papers (2023-09-29T02:21:31Z) - Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
Flat Regions in the Landscape Geometry [3.712728573432119]
We develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology.
We derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them.
We also find that minimizers found by variants of gradient descent can be connected by zero-error paths with a single bend.
arXiv Detail & Related papers (2022-02-07T09:57:54Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
Networks [15.711517003382484]
We show that the Hessian spectrum concentrates near positives, with the exception of $Theta(d)$ eigenvalues which growly with$d$.
This makes possible the creation and the annihilation of minima using powerful tools from bifurcation theory.
arXiv Detail & Related papers (2021-07-21T22:05:48Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Optimizing Mode Connectivity via Neuron Alignment [84.26606622400423]
Empirically, the local minima of loss functions can be connected by a learned curve in model space along which the loss remains nearly constant.
We propose a more general framework to investigate effect of symmetry on landscape connectivity by accounting for the weight permutations of networks being connected.
arXiv Detail & Related papers (2020-09-05T02:25:23Z) - On the Principle of Least Symmetry Breaking in Shallow ReLU Models [13.760721677322072]
We show that the emphleast loss of symmetry with respect to the target weights may apply to a broader range of settings.
Motivated by this, we conduct a series of experiments which corroborate this hypothesis for different classes of non-isotropic non-product distributions, smooth activation functions and networks with a few layers.
arXiv Detail & Related papers (2019-12-26T22:04:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.