Related papers: Learning through atypical ''phase transitions'' in overparameterized neural networks

Learning through atypical ''phase transitions'' in overparameterized neural networks

URL: http://arxiv.org/abs/2110.00683v1
Date: Fri, 1 Oct 2021 23:28:07 GMT
Title: Learning through atypical ''phase transitions'' in overparameterized neural networks
Authors: Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, Gabriele Perugini, Riccardo Zecchina
Abstract summary: Current deep neural networks are highly observableized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through overdense descent algorithms and achieve unexpected accuracy prediction. These are formidable challenges without generalization.
Score: 0.43496401697112685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that escape the bias-variance predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex neural network models. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalisation performance. We find that there exist a gap between the SAT/UNSAT interpolation transition where solutions begin to exist and the point where algorithms start to find solutions, i.e. where accessible solutions appear. This second phase transition coincides with the discontinuous appearance of atypical solutions that are locally extremely entropic, i.e., flat regions of the weight space that are particularly solution-dense and have good generalization properties. Although exponentially rare compared to typical solutions (which are narrower and extremely difficult to sample), entropic solutions are accessible to the algorithms used in learning. We can characterize the generalization error of different solutions and optimize the Bayesian prediction, for data generated from a structurally different network. Numerical tests on observables suggested by the theory confirm that the scenario extends to realistic deep networks.

Related papers

Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
The Unreasonable Effectiveness of Solving Inverse Problems with Neural Networks [24.766470360665647]
We show that neural networks trained to learn solutions to inverse problems can find better solutions than classicals even on their training set. Our findings suggest an alternative use for neural networks: rather than generalizing to new data for fast inference, they can also be used to find better solutions on known data.
arXiv Detail & Related papers (2024-08-15T12:38:10Z)
Neural variational Data Assimilation with Uncertainty Quantification using SPDE priors [28.804041716140194]
Recent advances in the deep learning community enables to address the problem through a neural architecture a variational data assimilation framework. In this work we use the theory of Partial Differential Equations (SPDE) and Gaussian Processes (GP) to estimate both space-and time covariance of the state.
arXiv Detail & Related papers (2024-02-02T19:18:12Z)
Solutions to Elliptic and Parabolic Problems via Finite Difference Based Unsupervised Small Linear Convolutional Neural Networks [1.124958340749622]
We propose a fully unsupervised approach, requiring no training data, to estimate finite difference solutions for PDEs directly via small linear convolutional neural networks. Our proposed approach uses substantially fewer parameters than similar finite difference-based approaches while also demonstrating comparable accuracy to the true solution for several selected elliptic and parabolic problems.
arXiv Detail & Related papers (2023-11-01T03:15:10Z)
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function. We study the impact of the location of the collocation points on the trainability of these models. We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z)
Message Passing Neural PDE Solvers [60.77761603258397]
We build a neural message passing solver, replacing allally designed components in the graph with backprop-optimized neural function approximators. We show that neural message passing solvers representationally contain some classical methods, such as finite differences, finite volumes, and WENO schemes. We validate our method on various fluid-like flow problems, demonstrating fast, stable, and accurate performance across different domain topologies, equation parameters, discretizations, etc., in 1D and 2D.
arXiv Detail & Related papers (2022-02-07T17:47:46Z)
Generalization of Neural Combinatorial Solvers Through the Lens of Adversarial Robustness [68.97830259849086]
Most datasets only capture a simpler subproblem and likely suffer from spurious features. We study adversarial robustness - a local generalization property - to reveal hard, model-specific instances and spurious features. Unlike in other applications, where perturbation models are designed around subjective notions of imperceptibility, our perturbation models are efficient and sound. Surprisingly, with such perturbations, a sufficiently expressive neural solver does not suffer from the limitations of the accuracy-robustness trade-off common in supervised learning.
arXiv Detail & Related papers (2021-10-21T07:28:11Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Physics-Informed Neural Network Method for Solving One-Dimensional Advection Equation Using PyTorch [0.0]
PINNs approach allows training neural networks while respecting the PDEs as a strong constraint in the optimization. In standard small-scale circulation simulations, it is shown that the conventional approach incorporates a pseudo diffusive effect that is almost as large as the effect of the turbulent diffusion model. Of all the schemes tested, only the PINNs approximation accurately predicted the outcome.
arXiv Detail & Related papers (2021-03-15T05:39:17Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions. Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.