Related papers: Generalization Error Analysis of Neural networks with Gradient Based Regularization

Generalization Error Analysis of Neural networks with Gradient Based Regularization

URL: http://arxiv.org/abs/2107.02797v1
Date: Tue, 6 Jul 2021 07:54:36 GMT
Title: Generalization Error Analysis of Neural networks with Gradient Based Regularization
Authors: Lingfeng Li and Xue-Cheng Tai and Jiang Yang
Abstract summary: We study gradient-based regularization methods for neural networks. We introduce a general framework to analyze the generalization error of regularized networks. We conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability.
Score: 2.7286395031146062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study gradient-based regularization methods for neural networks. We mainly focus on two regularization methods: the total variation and the Tikhonov regularization. Applying these methods is equivalent to using neural networks to solve some partial differential equations, mostly in high dimensions in practical applications. In this work, we introduce a general framework to analyze the generalization error of regularized networks. The error estimate relies on two assumptions on the approximation error and the quadrature error. Moreover, we conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability and adversarial robustness of neural networks. A graphical extension of the gradient-based methods are also considered in the experiments.

Related papers

Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Modify Training Directions in Function Space to Reduce Generalization Error [9.821059922409091]
We propose a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory.
arXiv Detail & Related papers (2023-07-25T07:11:30Z)
GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks. GIT combines the usage of gradient information and invariance transformations. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z)
A Derivation of Feedforward Neural Network Gradients Using Fr\'echet Calculus [0.0]
We show a derivation of the gradients of feedforward neural networks using Fr'teche calculus. We show how our analysis generalizes to more general neural network architectures including, but not limited to, convolutional networks.
arXiv Detail & Related papers (2022-09-27T08:14:00Z)
Generalization Error Bounds for Iterative Recovery Algorithms Unfolded as Neural Networks [6.173968909465726]
We introduce a general class of neural networks suitable for sparse reconstruction from few linear measurements. By allowing a wide range of degrees of weight-sharing between the layers, we enable a unified analysis for very different neural network types.
arXiv Detail & Related papers (2021-12-08T16:17:33Z)
Galerkin Neural Networks: A Framework for Approximating Variational Equations with Error Control [0.0]
We present a new approach to using neural networks to approximate the solutions of variational equations. We use a sequence of finite-dimensional subspaces whose basis functions are realizations of a sequence of neural networks.
arXiv Detail & Related papers (2021-05-28T20:25:40Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Optimization Theory for ReLU Neural Networks Trained with Normalization Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers. Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z)
Regularizing Meta-Learning via Gradient Dropout [102.29924160341572]
meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize. We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
arXiv Detail & Related papers (2020-04-13T10:47:02Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.