Related papers: A Quadrature Perspective on Frequency Bias in Neural Network Training with Nonuniform Data

A Quadrature Perspective on Frequency Bias in Neural Network Training with Nonuniform Data

URL: http://arxiv.org/abs/2205.14300v1
Date: Sat, 28 May 2022 02:31:19 GMT
Title: A Quadrature Perspective on Frequency Bias in Neural Network Training with Nonuniform Data
Authors: Annan Yu, Yunan Yang, Alex Townsend
Abstract summary: gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals. We use the Neural Tangent Kernel (NTK) to provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities.
Score: 1.7188280334580197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Small generalization errors of over-parameterized neural networks (NNs) can be partially explained by the frequency biasing phenomenon, where gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals. Using the Neural Tangent Kernel (NTK), one can provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities. Since most training data sets are not drawn from such distributions, we use the NTK model and a data-dependent quadrature rule to theoretically quantify the frequency biasing of NN training given fully nonuniform data. By replacing the loss function with a carefully selected Sobolev norm, we can further amplify, dampen, counterbalance, or reverse the intrinsic frequency biasing in NN training.

Related papers

Understanding the dynamics of the frequency bias in neural networks [0.0]
Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process. We develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN. We empirically show that the same principle extends to multi-layer NNs.
arXiv Detail & Related papers (2024-05-23T18:09:16Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions. We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets. We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Modeling Nonlinear Dynamics in Continuous Time with Inductive Biases on Decay Rates and/or Frequencies [37.795752939016225]
We propose a neural network-based model for nonlinear dynamics in continuous time that can impose inductive biases on decay rates and frequencies. We use neural networks to find an appropriate Koopman space, which are trained by minimizing multi-step forecasting and backcasting errors using irregularly sampled time-series data.
arXiv Detail & Related papers (2022-12-26T08:08:43Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
On the exact computation of linear frequency principle dynamics and its generalization [6.380166265263755]
Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that fits the target function from low to high frequency during the training. In this paper, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in frequency domain.
arXiv Detail & Related papers (2020-10-15T15:17:21Z)
Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions [121.10450359856242]
Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data. Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods. We develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.
arXiv Detail & Related papers (2020-06-20T22:45:32Z)
A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.