Typical and atypical solutions in non-convex neural networks with
discrete and continuous weights
- URL: http://arxiv.org/abs/2304.13871v2
- Date: Mon, 24 Jul 2023 11:44:01 GMT
- Title: Typical and atypical solutions in non-convex neural networks with
discrete and continuous weights
- Authors: Carlo Baldassi, Enrico M. Malatesta, Gabriele Perugini, Riccardo
Zecchina
- Abstract summary: We study the binary and continuous negative-margin perceptrons as simple non-constrained network models learning random rules and associations.
Both models exhibit subdominant minimizers which are extremely flat and wide.
For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers.
- Score: 2.7127628066830414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the binary and continuous negative-margin perceptrons as simple
non-convex neural network models learning random rules and associations. We
analyze the geometry of the landscape of solutions in both models and find
important similarities and differences. Both models exhibit subdominant
minimizers which are extremely flat and wide. These minimizers coexist with a
background of dominant solutions which are composed by an exponential number of
algorithmically inaccessible small clusters for the binary case (the frozen
1-RSB phase) or a hierarchical structure of clusters of different sizes for the
spherical case (the full RSB phase). In both cases, when a certain threshold in
constraint density is crossed, the local entropy of the wide flat minima
becomes non-monotonic, indicating a break-up of the space of robust solutions
into disconnected components. This has a strong impact on the behavior of
algorithms in binary models, which cannot access the remaining isolated
clusters. For the spherical case the behaviour is different, since even beyond
the disappearance of the wide flat minima the remaining solutions are shown to
always be surrounded by a large number of other solutions at any distance, up
to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find
solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB
approximation. For both models, the generalization performance as a learning
device is shown to be greatly improved by the existence of wide flat minimizers
even when trained in the highly underconstrained regime of very negative
margins.
Related papers
- Exact full-RSB SAT/UNSAT transition in infinitely wide two-layer neural networks [0.0]
We show that Gradient Descent is not able to reach the capacity irrespective of the presence of an overlap gap for typical states.
This finding suggests that gradient-based algorithms are biased towards highly atypical states.
arXiv Detail & Related papers (2024-10-09T09:41:28Z) - Message Passing Neural PDE Solvers [60.77761603258397]
We build a neural message passing solver, replacing allally designed components in the graph with backprop-optimized neural function approximators.
We show that neural message passing solvers representationally contain some classical methods, such as finite differences, finite volumes, and WENO schemes.
We validate our method on various fluid-like flow problems, demonstrating fast, stable, and accurate performance across different domain topologies, equation parameters, discretizations, etc., in 1D and 2D.
arXiv Detail & Related papers (2022-02-07T17:47:46Z) - Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
Flat Regions in the Landscape Geometry [3.712728573432119]
We develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology.
We derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them.
We also find that minimizers found by variants of gradient descent can be connected by zero-error paths with a single bend.
arXiv Detail & Related papers (2022-02-07T09:57:54Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Learning through atypical ''phase transitions'' in overparameterized
neural networks [0.43496401697112685]
Current deep neural networks are highly observableized (up to billions of connection weights) and nonlinear.
Yet they can fit data almost perfectly through overdense descent algorithms and achieve unexpected accuracy prediction.
These are formidable challenges without generalization.
arXiv Detail & Related papers (2021-10-01T23:28:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Unveiling the structure of wide flat minima in neural networks [0.46664938579243564]
Deep learning has revealed the application potential of networks across the sciences.
The success of deep learning has revealed the application potential of networks across the sciences.
arXiv Detail & Related papers (2021-07-02T16:04:57Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Properties of the geometry of solutions and capacity of multi-layer neural networks with Rectified Linear Units activations [2.3018169548556977]
We study the effects of Rectified Linear Units on the capacity and on the geometrical landscape of the solution space in two-layer neural networks.
We find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases.
Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure.
arXiv Detail & Related papers (2019-07-17T15:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.