Frequency Bias in Neural Networks for Input of Non-Uniform Density
- URL: http://arxiv.org/abs/2003.04560v1
- Date: Tue, 10 Mar 2020 07:20:14 GMT
- Title: Frequency Bias in Neural Networks for Input of Non-Uniform Density
- Authors: Ronen Basri, Meirav Galun, Amnon Geifman, David Jacobs, Yoni Kasten,
Shira Kritchman
- Abstract summary: We use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics.
Our results show convergence at a point $x in Sphered-1$ occurs in time $O(kappad/p(x))$ where $p(x)$ denotes the local density at $x$.
- Score: 27.75835200173761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have partly attributed the generalization ability of
over-parameterized neural networks to frequency bias -- networks trained with
gradient descent on data drawn from a uniform distribution find a low frequency
fit before high frequency ones. As realistic training sets are not drawn from a
uniform distribution, we here use the Neural Tangent Kernel (NTK) model to
explore the effect of variable density on training dynamics. Our results, which
combine analytic and empirical observations, show that when learning a pure
harmonic function of frequency $\kappa$, convergence at a point $\x \in
\Sphere^{d-1}$ occurs in time $O(\kappa^d/p(\x))$ where $p(\x)$ denotes the
local density at $\x$. Specifically, for data in $\Sphere^1$ we analytically
derive the eigenfunctions of the kernel associated with the NTK for two-layer
networks. We further prove convergence results for deep, fully connected
networks with respect to the spectral decomposition of the NTK. Our empirical
study highlights similarities and differences between deep and shallow networks
in this model.
Related papers
- Generative Kaleidoscopic Networks [2.321684718906739]
We utilize this property of neural networks to design a dataset kaleidoscope, termed as Generative Kaleidoscopic Networks'
We observed this phenomenon to various degrees for the other deep learning architectures like CNNs, Transformers & U-Nets.
arXiv Detail & Related papers (2024-02-19T02:48:40Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Understanding the Spectral Bias of Coordinate Based MLPs Via Training
Dynamics [2.9443230571766854]
We study the connection between the computations of ReLU networks, and the speed of gradient descent convergence.
We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
arXiv Detail & Related papers (2023-01-14T04:21:25Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly.
A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network.
Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z) - A Generalized Neural Tangent Kernel Analysis for Two-layer Neural
Networks [87.23360438947114]
We show that noisy gradient descent with weight decay can still exhibit a " Kernel-like" behavior.
This implies that the training loss converges linearly up to a certain accuracy.
We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
arXiv Detail & Related papers (2020-02-10T18:56:15Z) - Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural
Networks [17.188280334580195]
We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples.
Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Kernel Tangent (NTK)
We verify our theory with simulations on synthetic data and MNIST dataset.
arXiv Detail & Related papers (2020-02-07T00:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.