Over-parameterised Shallow Neural Networks with Asymmetrical Node
Scaling: Global Convergence Guarantees and Feature Learning
- URL: http://arxiv.org/abs/2302.01002v1
- Date: Thu, 2 Feb 2023 10:40:06 GMT
- Title: Over-parameterised Shallow Neural Networks with Asymmetrical Node
Scaling: Global Convergence Guarantees and Feature Learning
- Authors: Francois Caron, Fadhel Ayed, Paul Jung, Hoil Lee, Juho Lee, Hongseok
Yang
- Abstract summary: We consider optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter.
We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.
- Score: 23.47570704524471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the optimisation of large and shallow neural networks via
gradient flow, where the output of each hidden node is scaled by some positive
parameter. We focus on the case where the node scalings are non-identical,
differing from the classical Neural Tangent Kernel (NTK) parameterisation. We
prove that, for large neural networks, with high probability, gradient flow
converges to a global minimum AND can learn features, unlike in the NTK regime.
We also provide experiments on synthetic and real-world datasets illustrating
our theoretical results and showing the benefit of such scaling in terms of
pruning and transfer learning.
Related papers
- Stochastic Gradient Descent for Two-layer Neural Networks [2.0349026069285423]
This paper presents a study on the convergence rates of the descent (SGD) algorithm when applied to overparameterized two-layer neural networks.
Our approach combines the Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Space (RKHS) generated by NTK.
Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the dynamics and convergence properties of neural networks.
arXiv Detail & Related papers (2024-07-10T13:58:57Z) - Variational Inference for Infinitely Deep Neural Networks [0.4061135251278187]
unbounded depth neural network (UDN)
We introduce the unbounded depth neural network (UDN), an infinitely deep probabilistic model that adapts its complexity to the training data.
We study the UDN on real and synthetic data.
arXiv Detail & Related papers (2022-09-21T03:54:34Z) - Parameter Convex Neural Networks [13.42851919291587]
We propose the exponential multilayer neural network (EMLP) which is convex with regard to the parameters of the neural network under some conditions.
For late experiments, we use the same architecture to make the exponential graph convolutional network (EGCN) and do the experiment on the graph classificaion dataset.
arXiv Detail & Related papers (2022-06-11T16:44:59Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - DebiNet: Debiasing Linear Models with Nonlinear Overparameterized Neural
Networks [11.04121146441257]
We incorporate over- parameterized neural networks into semi-parametric models to bridge the gap between inference and prediction.
We show the theoretical foundations that make this possible and demonstrate with numerical experiments.
We propose a framework, DebiNet, in which we plug-in arbitrary feature selection methods to our semi-parametric neural network.
arXiv Detail & Related papers (2020-11-01T04:12:53Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.