Understanding the dynamics of the frequency bias in neural networks
- URL: http://arxiv.org/abs/2405.14957v1
- Date: Thu, 23 May 2024 18:09:16 GMT
- Title: Understanding the dynamics of the frequency bias in neural networks
- Authors: Juan Molina, Mircea Petrache, Francisco Sahli Costabal, MatÃas Courdurier,
- Abstract summary: Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process.
We develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN.
We empirically show that the same principle extends to multi-layer NNs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process. Namely, the NN first learns the low-frequency features before learning the high-frequency ones. In this study, we rigorously develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN in the Neural Tangent Kernel regime. Furthermore, using this insight, we explicitly demonstrate how an appropriate choice of distributions for the initialization weights can eliminate or control the frequency bias. We focus our study on the Fourier Features model, an NN where the first layer has sine and cosine activation functions, with frequencies sampled from a prescribed distribution. In this setup, we experimentally validate our theoretical results and compare the NN dynamics to the solution of the PDE using the finite element method. Finally, we empirically show that the same principle extends to multi-layer NNs.
Related papers
- How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Investigations on convergence behaviour of Physics Informed Neural
Networks across spectral ranges and derivative orders [0.0]
An important inference from Neural Kernel Tangent (NTK) theory is the existence of spectral bias (SB)
SB is low frequency components of the target function of a fully connected Artificial Neural Network (ANN) being learnt significantly faster than the higher frequencies during training.
This is established for Mean Square Error (MSE) loss functions with very low learning rate parameters.
It is firmly established that under normalized conditions, PINNs do exhibit strong spectral bias, and this increases with the order of the differential equation.
arXiv Detail & Related papers (2023-01-07T06:31:28Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - A Quadrature Perspective on Frequency Bias in Neural Network Training
with Nonuniform Data [1.7188280334580197]
gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals.
We use the Neural Tangent Kernel (NTK) to provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities.
arXiv Detail & Related papers (2022-05-28T02:31:19Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Linear Frequency Principle Model to Understand the Absence of
Overfitting in Neural Networks [4.86119220344659]
We show that low frequency dominance of target functions is the key condition for the non-overfitting of NNs.
Through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to a LFP model with quantitative prediction power.
arXiv Detail & Related papers (2021-01-30T10:11:37Z) - On the exact computation of linear frequency principle dynamics and its
generalization [6.380166265263755]
Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that fits the target function from low to high frequency during the training.
In this paper, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in frequency domain.
arXiv Detail & Related papers (2020-10-15T15:17:21Z) - Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks [9.23835409289015]
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective.
We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies.
arXiv Detail & Related papers (2019-01-19T13:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.