Unified View of Grokking, Double Descent and Emergent Abilities: A
Perspective from Circuits Competition
- URL: http://arxiv.org/abs/2402.15175v2
- Date: Mon, 26 Feb 2024 02:49:16 GMT
- Title: Unified View of Grokking, Double Descent and Emergent Abilities: A
Perspective from Circuits Competition
- Authors: Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
- Abstract summary: Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models.
We present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits.
- Score: 83.13280812128411
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent studies have uncovered intriguing phenomena in deep learning, such as
grokking, double descent, and emergent abilities in large language models,
which challenge human intuition and are crucial for a deeper understanding of
neural models. In this paper, we present a comprehensive framework that
provides a unified view of these three phenomena, focusing on the competition
between memorization and generalization circuits. This approach, initially
employed to explain grokking, is extended in our work to encompass a wider
range of model sizes and training data volumes. Our framework delineates four
distinct training dynamics, each depending on varying combinations of model
size and training data quantity. Utilizing this framework, we provide a
detailed analysis of the double descent phenomenon and propose two verifiable
predictions regarding its occurrence, both substantiated by our experimental
results. Moreover, we expand our framework to the multi-task learning paradigm,
demonstrating how algorithm tasks can be turned into emergent abilities. This
offers a novel perspective to understand emergent abilities in Large Language
Models.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Learning Interpretable Concepts: Unifying Causal Representation Learning
and Foundation Models [51.43538150982291]
We study how to learn human-interpretable concepts from data.
Weaving together ideas from both fields, we show that concepts can be provably recovered from diverse data.
arXiv Detail & Related papers (2024-02-14T15:23:59Z) - Unraveling the Enigma of Double Descent: An In-depth Analysis through the Lens of Learned Feature Space [12.907949196758565]
Double descent presents a counter-intuitive aspect within the machine learning domain.
We argue that double descent arises in imperfect models trained with noisy data.
arXiv Detail & Related papers (2023-10-20T15:10:16Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Does Deep Learning Learn to Abstract? A Systematic Probing Framework [69.2366890742283]
Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context.
We introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective.
arXiv Detail & Related papers (2023-02-23T12:50:02Z) - A Survey of Methods, Challenges and Perspectives in Causality [11.238098505498165]
We perform an extensive overview of the theories and methods for Causality from different perspectives.
We show early attempts to bring the fields together and the possible perspectives for the future.
arXiv Detail & Related papers (2023-02-01T07:47:26Z) - Internal Representations of Vision Models Through the Lens of Frames on
Data Manifolds [8.67467876089153]
We present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold.
Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point.
Using neural frames, we make observations about the way that models process, layer-by-layer, specific modes of variation within a small neighborhood of a datapoint.
arXiv Detail & Related papers (2022-11-19T01:48:19Z) - Causal Reasoning Meets Visual Representation Learning: A Prospective
Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models.
Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms.
This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.