Related papers: Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

URL: http://arxiv.org/abs/2007.05824v2
Date: Mon, 26 Oct 2020 18:10:22 GMT
Title: Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics
Authors: Taiji Suzuki
Abstract summary: We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
Score: 50.83356836818667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence. This potentially makes it difficult to directly deal with finite width network; especially in the neural tangent kernel regime, we cannot reveal favorable properties of neural networks beyond kernel methods. To realize more natural analysis, we consider a completely different approach in which we formulate the parameter training as a transportation map estimation and show its global convergence via the theory of the infinite dimensional Langevin dynamics. This enables us to analyze narrow and wide networks in a unifying manner. Moreover, we give generalization gap and excess risk bounds for the solution obtained by the dynamics. The excess risk bound achieves the so-called fast learning rate. In particular, we show an exponential convergence for a classification problem and a minimax optimal rate for a regression problem.

Related papers

Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z)
An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network [10.384951432591492]
Recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks. We show that this infinite-width analysis can be extended to the Jacobian of a deep neural network. We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation.
arXiv Detail & Related papers (2023-12-06T09:52:18Z)
Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks [0.8932296777085644]
Conventional artificial deep neural networks operating near the phase boundary of the signal propagation dynamics, also known as the edge of chaos, exhibit universal scaling laws of absorbing phase transitions. Our numerical results indicate that the multilayer perceptrons and the convolutional neural networks belong to the mean-field and the directed percolation classes, respectively.
arXiv Detail & Related papers (2023-07-05T13:39:02Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Approximation Power of Deep Neural Networks: an explanatory mathematical survey [0.0]
The goal of this survey is to present an explanatory review of the approximation properties of deep neural networks. We aim at understanding how and why deep neural networks outperform other classical linear and nonlinear approximation methods.
arXiv Detail & Related papers (2022-07-19T18:47:44Z)
Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates [7.094295642076582]
Mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime. We establish a new linear convergence result for two-layer neural networks trained by continuous-time noisy descent in the mean-field regime.
arXiv Detail & Related papers (2022-05-19T21:05:40Z)
On Feature Learning in Neural Networks with Global Convergence Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF) We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate [122.73276299136568]
Generative policy imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks. Despite its empirical success, it remains unclear whether GAIL with neural networks converges to the globally optimal solution.
arXiv Detail & Related papers (2020-03-08T03:39:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.