Approximation Results for Gradient Descent trained Neural Networks
- URL: http://arxiv.org/abs/2309.04860v1
- Date: Sat, 9 Sep 2023 18:47:55 GMT
- Title: Approximation Results for Gradient Descent trained Neural Networks
- Authors: G. Welper
- Abstract summary: The networks are fully connected constant depth increasing width.
The continuous kernel error norm implies an approximation under the natural smoothness assumption required for smooth functions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper contains approximation guarantees for neural networks that are
trained with gradient flow, with error measured in the continuous
$L_2(\mathbb{S}^{d-1})$-norm on the $d$-dimensional unit sphere and targets
that are Sobolev smooth. The networks are fully connected of constant depth and
increasing width. Although all layers are trained, the gradient flow
convergence is based on a neural tangent kernel (NTK) argument for the
non-convex second but last layer. Unlike standard NTK analysis, the continuous
error norm implies an under-parametrized regime, possible by the natural
smoothness assumption required for approximation. The typical
over-parametrization re-enters the results in form of a loss in approximation
rate relative to established approximation methods for Sobolev smooth
functions.
Related papers
- Approximation and Gradient Descent Training with Neural Networks [0.0]
Recent work extends a neural tangent kernel (NTK) optimization argument to an under-parametrized regime.
This paper establishes analogous results for networks trained by gradient descent.
arXiv Detail & Related papers (2024-05-19T23:04:09Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Generalization Error Bounds for Deep Neural Networks Trained by SGD [3.148524502470734]
Generalization error bounds for deep trained by gradient descent (SGD) are derived.
The bounds explicitly depend on the loss along the training trajectory.
Results show that our bounds are non-vacuous and robust with the change of neural networks and network hypers.
arXiv Detail & Related papers (2022-06-07T13:46:10Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Improved Overparametrization Bounds for Global Convergence of Stochastic
Gradient Descent for Shallow Neural Networks [1.14219428942199]
We study the overparametrization bounds required for the global convergence of gradient descent algorithm for a class of one hidden layer feed-forward neural networks.
arXiv Detail & Related papers (2022-01-28T11:30:06Z) - A Unifying View on Implicit Bias in Training Linear Neural Networks [31.65006970108761]
We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training.
We propose a tensor formulation of neural networks that includes fully-connected, diagonal, and convolutional networks as special cases.
arXiv Detail & Related papers (2020-10-06T06:08:35Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.