Grokking as a First Order Phase Transition in Two Layer Networks
- URL: http://arxiv.org/abs/2310.03789v3
- Date: Sun, 5 May 2024 12:21:36 GMT
- Title: Grokking as a First Order Phase Transition in Two Layer Networks
- Authors: Noa Rubin, Inbar Seroussi, Zohar Ringel,
- Abstract summary: A key property of deep neural networks (DNNs) is their ability to learn new features during training.
Grokking is also believed to be a beyond lazy-learning/Gaussian Process phenomenon involving feature learning.
We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition.
- Score: 4.096453902709292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.
Related papers
- Deep Grokking: Would Deep Neural Networks Generalize Better? [51.24007462968805]
Grokking refers to a sharp rise of the network's generalization accuracy on the test set.
We find that deep neural networks can be more susceptible to grokking than its shallower counterparts.
We also observe an intriguing multi-stage generalization phenomenon when increase the depth of the model.
arXiv Detail & Related papers (2024-05-29T19:05:11Z) - How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs)
We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function.
This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z) - Graph Neural Networks Provably Benefit from Structural Information: A
Feature Learning Perspective [53.999128831324576]
Graph neural networks (GNNs) have pioneered advancements in graph representation learning.
This study investigates the role of graph convolution within the context of feature learning theory.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - A Tale of Two Circuits: Grokking as Competition of Sparse and Dense
Subnetworks [1.5297569497776375]
We study the internal structure of networks undergoing grokking on the sparse parity task.
We find that the grokking phase transition corresponds to the emergence of a sparse subnetwork that dominates model predictions.
arXiv Detail & Related papers (2023-03-21T14:17:29Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neighborhood Convolutional Network: A New Paradigm of Graph Neural
Networks for Node Classification [12.062421384484812]
Graph Convolutional Network (GCN) decouples neighborhood aggregation and feature transformation in each convolutional layer.
In this paper, we propose a new paradigm of GCN, termed Neighborhood Convolutional Network (NCN)
In this way, the model could inherit the merit of decoupled GCN for aggregating neighborhood information, at the same time, develop much more powerful feature learning modules.
arXiv Detail & Related papers (2022-11-15T02:02:51Z) - Grokking phase transitions in learning local rules with gradient descent [0.0]
We show that grokking is a phase transition and find exact analytic expressions for the critical exponents, grokking probability, and grokking time distribution.
We numerically analyse the connection between structure formation and grokking.
arXiv Detail & Related papers (2022-10-26T11:07:04Z) - Graph Modularity: Towards Understanding the Cross-Layer Transition of
Feature Representations in Deep Neural Networks [7.187240308034312]
We move a tiny step towards understanding the transition of feature representations in deep neural networks (DNNs)
We first characterize this transition by analyzing the class separation in intermediate layers, and next model the process of class separation as community evolution in dynamic graphs.
We find that modularity tends to rise as the layer goes deeper, but descends or reaches a plateau at particular layers.
arXiv Detail & Related papers (2021-11-24T13:29:17Z) - Spatio-Temporal Inception Graph Convolutional Networks for
Skeleton-Based Action Recognition [126.51241919472356]
We design a simple and highly modularized graph convolutional network architecture for skeleton-based action recognition.
Our network is constructed by repeating a building block that aggregates multi-granularity information from both the spatial and temporal paths.
arXiv Detail & Related papers (2020-11-26T14:43:04Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.