Related papers: Exploring Grokking: Experimental and Mechanistic Investigations

Exploring Grokking: Experimental and Mechanistic Investigations

URL: http://arxiv.org/abs/2412.10898v1
Date: Sat, 14 Dec 2024 17:11:38 GMT
Title: Exploring Grokking: Experimental and Mechanistic Investigations
Authors: Hu Qiye, Zhou Hao, Yu RuoXi,
Abstract summary: grokking involves a neural network memorizing a training set with zero training error and near-random test error.<n>Our study comprises extensive experiments and an exploration of the research behind the mechanism of grokking.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The phenomenon of grokking in over-parameterized neural networks has garnered significant interest. It involves the neural network initially memorizing the training set with zero training error and near-random test error. Subsequent prolonged training leads to a sharp transition from no generalization to perfect generalization. Our study comprises extensive experiments and an exploration of the research behind the mechanism of grokking. Through experiments, we gained insights into its behavior concerning the training data fraction, the model, and the optimization. The mechanism of grokking has been a subject of various viewpoints proposed by researchers, and we introduce some of these perspectives.

Related papers

Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking [50.465604300990904]
Grokking refers to the abrupt improvement in test accuracy after extended overfitting. In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations.
arXiv Detail & Related papers (2025-04-04T04:42:38Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Spectral Introspection Identifies Group Training Dynamics in Deep Neural Networks for Neuroimaging [16.002859238417223]
We present a novel introspection framework for Deep Learning on Neuroimaging data. Unlike post-hoc introspection techniques, which require fully-trained models for evaluation, our method allows for the study of training dynamics on the fly.
arXiv Detail & Related papers (2024-06-17T17:58:15Z)
A rationale from frequency perspective for grokking in training neural network [7.264378254137811]
Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs.
arXiv Detail & Related papers (2024-05-24T06:57:23Z)
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z)
Regularization, early-stopping and dreaming: a Hopfield-like setup to address generalization and overfitting [0.0]
We look for optimal network parameters by applying a gradient descent over a regularized loss function. Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
arXiv Detail & Related papers (2023-08-01T15:04:30Z)
Membership Inference Attacks and Defenses in Neural Network Pruning [5.856147967309101]
We conduct the first analysis of privacy risks in neural network pruning. Specifically, we investigate the impacts of neural network pruning on training data privacy. We propose a new defense mechanism to protect the pruning process by mitigating the prediction divergence.
arXiv Detail & Related papers (2022-02-07T16:31:53Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error. We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z)
Behavioral Experiments for Understanding Catastrophic Forgetting [9.679643351149215]
We apply the techniques of experimental psychology to investigating catastrophic forgetting in neural networks. We present a series of controlled experiments with two-layer ReLU networks, and exploratory results revealing a new understanding of the behavior of catastrophic forgetting.
arXiv Detail & Related papers (2021-10-20T14:00:02Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.