A Quadrature Perspective on Frequency Bias in Neural Network Training
with Nonuniform Data
- URL: http://arxiv.org/abs/2205.14300v1
- Date: Sat, 28 May 2022 02:31:19 GMT
- Title: A Quadrature Perspective on Frequency Bias in Neural Network Training
with Nonuniform Data
- Authors: Annan Yu, Yunan Yang, Alex Townsend
- Abstract summary: gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals.
We use the Neural Tangent Kernel (NTK) to provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities.
- Score: 1.7188280334580197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Small generalization errors of over-parameterized neural networks (NNs) can
be partially explained by the frequency biasing phenomenon, where
gradient-based algorithms minimize the low-frequency misfit before reducing the
high-frequency residuals. Using the Neural Tangent Kernel (NTK), one can
provide a theoretically rigorous analysis for training where data are drawn
from constant or piecewise-constant probability densities. Since most training
data sets are not drawn from such distributions, we use the NTK model and a
data-dependent quadrature rule to theoretically quantify the frequency biasing
of NN training given fully nonuniform data. By replacing the loss function with
a carefully selected Sobolev norm, we can further amplify, dampen,
counterbalance, or reverse the intrinsic frequency biasing in NN training.
Related papers
- Understanding the dynamics of the frequency bias in neural networks [0.0]
Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process.
We develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN.
We empirically show that the same principle extends to multi-layer NNs.
arXiv Detail & Related papers (2024-05-23T18:09:16Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Modeling Nonlinear Dynamics in Continuous Time with Inductive Biases on
Decay Rates and/or Frequencies [37.795752939016225]
We propose a neural network-based model for nonlinear dynamics in continuous time that can impose inductive biases on decay rates and frequencies.
We use neural networks to find an appropriate Koopman space, which are trained by minimizing multi-step forecasting and backcasting errors using irregularly sampled time-series data.
arXiv Detail & Related papers (2022-12-26T08:08:43Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - On the exact computation of linear frequency principle dynamics and its
generalization [6.380166265263755]
Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that fits the target function from low to high frequency during the training.
In this paper, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in frequency domain.
arXiv Detail & Related papers (2020-10-15T15:17:21Z) - Frequentist Uncertainty in Recurrent Neural Networks via Blockwise
Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data.
Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods.
We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.