Learning through atypical ''phase transitions'' in overparameterized
neural networks
- URL: http://arxiv.org/abs/2110.00683v1
- Date: Fri, 1 Oct 2021 23:28:07 GMT
- Title: Learning through atypical ''phase transitions'' in overparameterized
neural networks
- Authors: Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba
Pacelli, Gabriele Perugini, Riccardo Zecchina
- Abstract summary: Current deep neural networks are highly observableized (up to billions of connection weights) and nonlinear.
Yet they can fit data almost perfectly through overdense descent algorithms and achieve unexpected accuracy prediction.
These are formidable challenges without generalization.
- Score: 0.43496401697112685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current deep neural networks are highly overparameterized (up to billions of
connection weights) and nonlinear. Yet they can fit data almost perfectly
through variants of gradient descent algorithms and achieve unexpected levels
of prediction accuracy without overfitting. These are formidable results that
escape the bias-variance predictions of statistical learning and pose
conceptual challenges for non-convex optimization. In this paper, we use
methods from statistical physics of disordered systems to analytically study
the computational fallout of overparameterization in nonconvex neural network
models. As the number of connection weights increases, we follow the changes of
the geometrical structure of different minima of the error loss function and
relate them to learning and generalisation performance. We find that there
exist a gap between the SAT/UNSAT interpolation transition where solutions
begin to exist and the point where algorithms start to find solutions, i.e.
where accessible solutions appear. This second phase transition coincides with
the discontinuous appearance of atypical solutions that are locally extremely
entropic, i.e., flat regions of the weight space that are particularly
solution-dense and have good generalization properties. Although exponentially
rare compared to typical solutions (which are narrower and extremely difficult
to sample), entropic solutions are accessible to the algorithms used in
learning. We can characterize the generalization error of different solutions
and optimize the Bayesian prediction, for data generated from a structurally
different network. Numerical tests on observables suggested by the theory
confirm that the scenario extends to realistic deep networks.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - The Unreasonable Effectiveness of Solving Inverse Problems with Neural Networks [24.766470360665647]
We show that neural networks trained to learn solutions to inverse problems can find better solutions than classicals even on their training set.
Our findings suggest an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.
arXiv Detail & Related papers (2024-08-15T12:38:10Z) - Solutions to Elliptic and Parabolic Problems via Finite Difference Based Unsupervised Small Linear Convolutional Neural Networks [1.124958340749622]
We propose a fully unsupervised approach, requiring no training data, to estimate finite difference solutions for PDEs directly via small linear convolutional neural networks.
Our proposed approach uses substantially fewer parameters than similar finite difference-based approaches while also demonstrating comparable accuracy to the true solution for several selected elliptic and parabolic problems.
arXiv Detail & Related papers (2023-11-01T03:15:10Z) - Adaptive Self-supervision Algorithms for Physics-informed Neural
Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function.
We study the impact of the location of the collocation points on the trainability of these models.
We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z) - Message Passing Neural PDE Solvers [60.77761603258397]
We build a neural message passing solver, replacing allally designed components in the graph with backprop-optimized neural function approximators.
We show that neural message passing solvers representationally contain some classical methods, such as finite differences, finite volumes, and WENO schemes.
We validate our method on various fluid-like flow problems, demonstrating fast, stable, and accurate performance across different domain topologies, equation parameters, discretizations, etc., in 1D and 2D.
arXiv Detail & Related papers (2022-02-07T17:47:46Z) - Generalization of Neural Combinatorial Solvers Through the Lens of
Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features.
We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features.
Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound.
Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Physics-Informed Neural Network Method for Solving One-Dimensional
Advection Equation Using PyTorch [0.0]
PINNs approach allows training neural networks while respecting the PDEs as a strong constraint in the optimization.
In standard small-scale circulation simulations, it is shown that the conventional approach incorporates a pseudo diffusive effect that is almost as large as the effect of the turbulent diffusion model.
Of all the schemes tested, only the PINNs approximation accurately predicted the outcome.
arXiv Detail & Related papers (2021-03-15T05:39:17Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.