Related papers: Generalization Error Bounds for Deep Neural Networks Trained by SGD

Generalization Error Bounds for Deep Neural Networks Trained by SGD

URL: http://arxiv.org/abs/2206.03299v2
Date: Mon, 29 May 2023 06:05:47 GMT
Title: Generalization Error Bounds for Deep Neural Networks Trained by SGD
Authors: Mingze Wang, Chao Ma
Abstract summary: Generalization error bounds for deep trained by gradient descent (SGD) are derived. The bounds explicitly depend on the loss along the training trajectory. Results show that our bounds are non-vacuous and robust with the change of neural networks and network hypers.
Score: 3.148524502470734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalization error bounds for deep neural networks trained by stochastic gradient descent (SGD) are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms. The bounds explicitly depend on the loss along the training trajectory, and work for a wide range of network architectures including multilayer perceptron (MLP) and convolutional neural networks (CNN). Compared with other algorithm-depending generalization estimates such as uniform stability-based bounds, our bounds do not require $L$-smoothness of the nonconvex loss function, and apply directly to SGD instead of Stochastic Langevin gradient descent (SGLD). Numerical results show that our bounds are non-vacuous and robust with the change of optimizer and network hyperparameters.

Related papers

Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression [19.988762532185884]
We show that, if the neural network is trained by GD with early stopping, then the trained network renders a sharp rate of nonparametric regression risk. $cO(eps_n2)$ is the same rate as that for the classical kernel regression trained by GD with early stopping.
arXiv Detail & Related papers (2024-11-05T08:43:54Z)
Adaptive Multilevel Neural Networks for Parametric PDEs with Error Estimation [0.0]
A neural network architecture is presented to solve high-dimensional parameter-dependent partial differential equations (pPDEs) It is constructed to map parameters of the model data to corresponding finite element solutions. It outputs a coarse grid solution and a series of corrections as produced in an adaptive finite element method (AFEM)
arXiv Detail & Related papers (2024-03-19T11:34:40Z)
Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
Approximation Results for Gradient Descent trained Neural Networks [0.0]
The networks are fully connected constant depth increasing width. The continuous kernel error norm implies an approximation under the natural smoothness assumption required for smooth functions.
arXiv Detail & Related papers (2023-09-09T18:47:55Z)
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks [55.86300309474023]
We conduct a comprehensive stability and generalization analysis of gradient descent (GD) for multi-layer NNs. We derive the excess risk rate of $O(1/sqrtn)$ for GD algorithms in both two-layer and three-layer NNs.
arXiv Detail & Related papers (2023-05-26T12:51:38Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources. The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
On generalization bounds for deep networks based on loss surface implicit regularization [5.68558935178946]
Modern deep neural networks generalize well despite a large number of parameters. That modern deep neural networks generalize well despite a large number of parameters contradicts the classical statistical learning theory.
arXiv Detail & Related papers (2022-01-12T16:41:34Z)
Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions. We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z)
The Heavy-Tail Phenomenon in SGD [7.366405857677226]
We show that depending on the structure of the Hessian of the loss at the minimum, the SGD iterates will converge to a emphheavy-tailed stationary distribution. We translate our results into insights about the behavior of SGD in deep learning.
arXiv Detail & Related papers (2020-06-08T16:43:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.