Related papers: Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

URL: http://arxiv.org/abs/2402.15175v2
Date: Mon, 26 Feb 2024 02:49:16 GMT
Title: Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition
Authors: Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun
Abstract summary: Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models. We present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits.
Score: 83.13280812128411
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models. In this paper, we present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits. This approach, initially employed to explain grokking, is extended in our work to encompass a wider range of model sizes and training data volumes. Our framework delineates four distinct training dynamics, each depending on varying combinations of model size and training data quantity. Utilizing this framework, we provide a detailed analysis of the double descent phenomenon and propose two verifiable predictions regarding its occurrence, both substantiated by our experimental results. Moreover, we expand our framework to the multi-task learning paradigm, demonstrating how algorithm tasks can be turned into emergent abilities. This offers a novel perspective to understand emergent abilities in Large Language Models.

Related papers

Quantifying Cross-Modality Memorization in Vision-Language Models [86.82366725590508]
We study the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models.<n>Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities.
arXiv Detail & Related papers (2025-06-05T16:10:47Z)
The emergence of sparse attention: impact of data distribution and benefits of repetition [14.652502263025882]
We study the emergence over training of sparse attention, a critical and frequently observed attention pattern in Transformers.<n>By combining theoretical analysis of a toy model with empirical observations on small Transformers trained on a linear regression variant, we uncover the mechanics sparse attention emergence.<n>Our findings provide a simple, theoretically grounded framework for understanding how data distributions and model design influence the learning dynamics behind one form of emergence.
arXiv Detail & Related papers (2025-05-23T13:14:02Z)
Cross-Modal Consistency in Multimodal Large Language Models [33.229271701817616]
We introduce a novel concept termed cross-modal consistency. Our experimental findings reveal a pronounced inconsistency between the vision and language modalities within GPT-4V. Our research yields insights into the appropriate utilization of such models and hints at potential avenues for enhancing their design.
arXiv Detail & Related papers (2024-11-14T08:22:42Z)
Rethinking Weight-Averaged Model-merging [15.2881959315021]
Model-merging has emerged as a powerful approach in deep learning, capable of enhancing model performance without any training. We investigate this technique from three novel perspectives to provide deeper insights into why and how weight-averaged model-merging works. Our findings shed light on the "black box" of weight-averaged model-merging, offering valuable insights and practical recommendations.
arXiv Detail & Related papers (2024-11-14T08:02:14Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Learning Interpretable Concepts: Unifying Causal Representation Learning and Foundation Models [51.43538150982291]
We study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we show that concepts can be provably recovered from diverse data.
arXiv Detail & Related papers (2024-02-14T15:23:59Z)
Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
Does Deep Learning Learn to Abstract? A Systematic Probing Framework [69.2366890742283]
Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context. We introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective.
arXiv Detail & Related papers (2023-02-23T12:50:02Z)
A Survey of Methods, Challenges and Perspectives in Causality [11.238098505498165]
We perform an extensive overview of the theories and methods for Causality from different perspectives. We show early attempts to bring the fields together and the possible perspectives for the future.
arXiv Detail & Related papers (2023-02-01T07:47:26Z)
Internal Representations of Vision Models Through the Lens of Frames on Data Manifolds [8.67467876089153]
We present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold. Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point. Using neural frames, we make observations about the way that models process, layer-by-layer, specific modes of variation within a small neighborhood of a datapoint.
arXiv Detail & Related papers (2022-11-19T01:48:19Z)
Causal Reasoning Meets Visual Representation Learning: A Prospective Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.