Gradient Descent on Infinitely Wide Neural Networks: Global Convergence
and Generalization
- URL: http://arxiv.org/abs/2110.08084v1
- Date: Fri, 15 Oct 2021 13:25:32 GMT
- Title: Gradient Descent on Infinitely Wide Neural Networks: Global Convergence
and Generalization
- Authors: Francis Bach (SIERRA), Lena\"ic Chizat (EPFL)
- Abstract summary: Many supervised machine learning methods are cast as optimization problems.
For prediction models which are linear in their parameters, this often leads to problems for prediction guarantees.
In this paper, we consider two-layer neural networks with homogeneous activation functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many supervised machine learning methods are naturally cast as optimization
problems. For prediction models which are linear in their parameters, this
often leads to convex problems for which many mathematical guarantees exist.
Models which are non-linear in their parameters such as neural networks lead to
non-convex optimization problems for which guarantees are harder to obtain. In
this review paper, we consider two-layer neural networks with homogeneous
activation functions where the number of hidden neurons tends to infinity, and
show how qualitative convergence guarantees may be derived.
Related papers
- On the Convergence Analysis of Over-Parameterized Variational Autoencoders: A Neural Tangent Kernel Perspective [7.580900499231056]
Variational Auto-Encoders (VAEs) have emerged as powerful probabilistic models for generative tasks.
This paper provides a mathematical proof of VAE under mild assumptions.
We also establish a novel connection between the optimization problem faced by over-Eized SNNs and the Kernel Ridge (KRR) problem.
arXiv Detail & Related papers (2024-09-09T06:10:31Z) - LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks.
We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z) - Calibrating Neural Networks' parameters through Optimal Contraction in a Prediction Problem [0.0]
The paper details how a recurrent neural networks (RNN) can be transformed into a contraction in a domain where its parameters are linear.
It then demonstrates that a prediction problem modeled through an RNN, with a specific regularization term in the loss function, can have its first-order conditions expressed analytically.
We establish that, if certain conditions are met, optimal parameters exist, and can be found through a straightforward algorithm to any desired precision.
arXiv Detail & Related papers (2024-06-15T18:08:04Z) - Automated Design of Linear Bounding Functions for Sigmoidal Nonlinearities in Neural Networks [23.01933325606068]
Existing complete verification techniques offer provable guarantees for all robustness queries but struggle to scale beyond small neural networks.
We propose a novel parameter search method to improve the quality of these linear approximations.
Specifically, we show that using a simple search method, carefully adapted to the given verification problem through state-of-the-art algorithm configuration techniques, improves the average global lower bound by 25% on average over the current state of the art.
arXiv Detail & Related papers (2024-06-14T16:16:26Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Loss landscapes and optimization in over-parameterized non-linear
systems and neural networks [20.44438519046223]
We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
We show that wide neural networks satisfy the PL$*$ condition, which explains the (S)GD convergence to a global minimum.
arXiv Detail & Related papers (2020-02-29T17:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.