Modify Training Directions in Function Space to Reduce Generalization
Error
- URL: http://arxiv.org/abs/2307.13290v1
- Date: Tue, 25 Jul 2023 07:11:30 GMT
- Title: Modify Training Directions in Function Space to Reduce Generalization
Error
- Authors: Yi Yu, Wenlian Lu, Boyu Chen
- Abstract summary: We propose a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix.
We explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory.
- Score: 9.821059922409091
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose theoretical analyses of a modified natural gradient descent method
in the neural network function space based on the eigendecompositions of neural
tangent kernel and Fisher information matrix. We firstly present analytical
expression for the function learned by this modified natural gradient under the
assumptions of Gaussian distribution and infinite width limit. Thus, we
explicitly derive the generalization error of the learned neural network
function using theoretical methods from eigendecomposition and statistics
theory. By decomposing of the total generalization error attributed to
different eigenspace of the kernel in function space, we propose a criterion
for balancing the errors stemming from training set and the distribution
discrepancy between the training set and the true data. Through this approach,
we establish that modifying the training direction of the neural network in
function space leads to a reduction in the total generalization error.
Furthermore, We demonstrate that this theoretical framework is capable to
explain many existing results of generalization enhancing methods. These
theoretical results are also illustrated by numerical examples on synthetic
data.
Related papers
- Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay [13.803850290216257]
We rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method.
Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks.
arXiv Detail & Related papers (2024-01-03T08:00:50Z) - A theory of data variability in Neural Network Bayesian inference [0.70224924046445]
We provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks.
We derive the generalization properties from the statistical properties of the input.
We show that data variability leads to a non-Gaussian action reminiscent of a ($varphi3+varphi4$)-theory.
arXiv Detail & Related papers (2023-07-31T14:11:32Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Generalization Error Analysis of Neural networks with Gradient Based
Regularization [2.7286395031146062]
We study gradient-based regularization methods for neural networks.
We introduce a general framework to analyze the generalization error of regularized networks.
We conduct some experiments on the image classification tasks to show that gradient-based methods can significantly improve the generalization ability.
arXiv Detail & Related papers (2021-07-06T07:54:36Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Error Bounds of the Invariant Statistics in Machine Learning of Ergodic
It\^o Diffusions [8.627408356707525]
We study the theoretical underpinnings of machine learning of ergodic Ito diffusions.
We deduce a linear dependence of the errors of one-point and two-point invariant statistics on the error in the learning of the drift and diffusion coefficients.
arXiv Detail & Related papers (2021-05-21T02:55:59Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Relaxing the Constraints on Predictive Coding Models [62.997667081978825]
Predictive coding is an influential theory of cortical function which posits that the principal computation the brain performs is the minimization of prediction errors.
Standard implementations of the algorithm still involve potentially neurally implausible features such as identical forward and backward weights, backward nonlinear derivatives, and 1-1 error unit connectivity.
In this paper, we show that these features are not integral to the algorithm and can be removed either directly or through learning additional sets of parameters with Hebbian update rules without noticeable harm to learning performance.
arXiv Detail & Related papers (2020-10-02T15:21:37Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.