A rationale from frequency perspective for grokking in training neural network
- URL: http://arxiv.org/abs/2405.17479v1
- Date: Fri, 24 May 2024 06:57:23 GMT
- Title: A rationale from frequency perspective for grokking in training neural network
- Authors: Zhangchen Zhou, Yaoyu Zhang, Zhi-Qin John Xu,
- Abstract summary: Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training.
In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs.
- Score: 7.264378254137811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs. The core insight is that the networks initially learn the less salient frequency components present in the test data. We observe this phenomenon across both synthetic and real datasets, offering a novel viewpoint for elucidating the grokking phenomenon by characterizing it through the lens of frequency dynamics during the training process. Our empirical frequency-based analysis sheds new light on understanding the grokking phenomenon and its underlying mechanisms.
Related papers
- Collective variables of neural networks: empirical time evolution and scaling laws [0.535514140374842]
We show that certain measures on the spectrum of the empirical neural tangent kernel, specifically entropy and trace, yield insight into the representations learned by a neural network.
Results are demonstrated first on test cases before being shown on more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies.
arXiv Detail & Related papers (2024-10-09T21:37:14Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Testing predictions of representation cost theory with CNNs [5.816527700115096]
We show that trained convolutional neural networks (CNNs) have different levels of sensitivity to signals of different frequency.
This is a consequence of the frequency distribution of natural images, which is known to have most of its power concentrated in low-to-mid frequencies.
arXiv Detail & Related papers (2022-10-03T22:35:43Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Overview frequency principle/spectral bias in deep learning [3.957124094805574]
We show a Frequency Principle (F-Principle) of the training behavior of deep neural networks (DNNs)
The F-Principle is first demonstrated by one-dimensional synthetic data followed by the verification in high-dimensional real datasets.
This low-frequency implicit bias reveals the strength of neural network in learning low-frequency functions as well as its deficiency in learning high-frequency functions.
arXiv Detail & Related papers (2022-01-19T03:08:33Z) - The Surprising Simplicity of the Early-Time Learning Dynamics of Neural
Networks [43.860358308049044]
In work, we show that these common perceptions can be completely false in the early phase of learning.
We argue that this surprising simplicity can persist in networks with more layers with convolutional architecture.
arXiv Detail & Related papers (2020-06-25T17:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.