On the exact computation of linear frequency principle dynamics and its
generalization
- URL: http://arxiv.org/abs/2010.08153v1
- Date: Thu, 15 Oct 2020 15:17:21 GMT
- Title: On the exact computation of linear frequency principle dynamics and its
generalization
- Authors: Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang
- Abstract summary: Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that fits the target function from low to high frequency during the training.
In this paper, we derive the exact differential equation, namely Linear Frequency-Principle (LFP) model, governing the evolution of NN output function in frequency domain.
- Score: 6.380166265263755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works show an intriguing phenomenon of Frequency Principle
(F-Principle) that deep neural networks (DNNs) fit the target function from low
to high frequency during the training, which provides insight into the training
and generalization behavior of DNNs in complex tasks. In this paper, through
analysis of an infinite-width two-layer NN in the neural tangent kernel (NTK)
regime, we derive the exact differential equation, namely Linear
Frequency-Principle (LFP) model, governing the evolution of NN output function
in the frequency domain during the training. Our exact computation applies for
general activation functions with no assumption on size and distribution of
training data. This LFP model unravels that higher frequencies evolve
polynomially or exponentially slower than lower frequencies depending on the
smoothness/regularity of the activation function. We further bridge the gap
between training dynamics and generalization by proving that LFP model
implicitly minimizes a Frequency-Principle norm (FP-norm) of the learned
function, by which higher frequencies are more severely penalized depending on
the inverse of their evolution rate. Finally, we derive an \textit{a priori}
generalization error bound controlled by the FP-norm of the target function,
which provides a theoretical justification for the empirical results that DNNs
often generalize well for low frequency functions.
Related papers
- Understanding the dynamics of the frequency bias in neural networks [0.0]
Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process.
We develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN.
We empirically show that the same principle extends to multi-layer NNs.
arXiv Detail & Related papers (2024-05-23T18:09:16Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Properties and Potential Applications of Random Functional-Linked Types
of Neural Networks [81.56822938033119]
Random functional-linked neural networks (RFLNNs) offer an alternative way of learning in deep structure.
This paper gives some insights into the properties of RFLNNs from the viewpoints of frequency domain.
We propose a method to generate a BLS network with better performance, and design an efficient algorithm for solving Poison's equation.
arXiv Detail & Related papers (2023-04-03T13:25:22Z) - Incremental Spatial and Spectral Learning of Neural Operators for
Solving Large-Scale PDEs [86.35471039808023]
We introduce the Incremental Fourier Neural Operator (iFNO), which progressively increases the number of frequency modes used by the model.
We show that iFNO reduces total training time while maintaining or improving generalization performance across various datasets.
Our method demonstrates a 10% lower testing error, using 20% fewer frequency modes compared to the existing Fourier Neural Operator, while also achieving a 30% faster training.
arXiv Detail & Related papers (2022-11-28T09:57:15Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - A Quadrature Perspective on Frequency Bias in Neural Network Training
with Nonuniform Data [1.7188280334580197]
gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals.
We use the Neural Tangent Kernel (NTK) to provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities.
arXiv Detail & Related papers (2022-05-28T02:31:19Z) - Linear Frequency Principle Model to Understand the Absence of
Overfitting in Neural Networks [4.86119220344659]
We show that low frequency dominance of target functions is the key condition for the non-overfitting of NNs.
Through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to a LFP model with quantitative prediction power.
arXiv Detail & Related papers (2021-01-30T10:11:37Z) - Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks [9.23835409289015]
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective.
We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies.
arXiv Detail & Related papers (2019-01-19T13:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.