Related papers: A Theory for Emergence of Complex Skills in Language Models

A Theory for Emergence of Complex Skills in Language Models

URL: http://arxiv.org/abs/2307.15936v2
Date: Mon, 6 Nov 2023 00:36:24 GMT
Title: A Theory for Emergence of Complex Skills in Language Models
Authors: Sanjeev Arora, Anirudh Goyal
Abstract summary: A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework.
Score: 56.947273387302616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this {\em slingshot generalization} since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.

Related papers

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search [15.387256204743407]
Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment.<n>Inference costs now represent a significant and growing component of the overall resource burden.<n>We introduce directed skill search (DS3), a general framework that represents inference as expressive over a learned skill graph.
arXiv Detail & Related papers (2025-06-10T14:47:48Z)
Model Successor Functions [31.25792515137003]
In inductive generalization, it is often assumed that the training data lie in the easier side, while the testing data lie in the harder side. This work provides a formalization that centers on the concept of model successors. Then we outline directions to adapt well-established techniques towards the learning of model successors.
arXiv Detail & Related papers (2025-01-31T22:27:09Z)
Latent-Predictive Empowerment: Measuring Empowerment without a Simulator [56.53777237504011]
We present Latent-Predictive Empowerment (LPE), an algorithm that can compute empowerment in a more practical manner. LPE learns large skillsets by maximizing an objective that is a principled replacement for the mutual information between skills and states.
arXiv Detail & Related papers (2024-10-15T00:41:18Z)
The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is a critical step in the NLP pipeline. Despite its recognized importance as a standard representation method in NLP, the theoretical underpinnings of tokenization are not yet fully understood. The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.
arXiv Detail & Related papers (2024-07-16T11:12:28Z)
A Mathematical Theory for Learning Semantic Languages by Abstract Learners [9.139188656944429]
We develop a mathematical theory to explain the emergence of learned skills, taking the learning process into account. We demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph.
arXiv Detail & Related papers (2024-04-10T13:50:46Z)
Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks [40.7766635942194]
We propose a probing framework to investigate whether the atomic skill can spontaneously generalize to complex reasoning tasks. We then introduce a hierarchical curriculum learning training strategy to achieve better skill generalization. By leveraging hierarchical curriculum learning, we successfully induce generalization, significantly improve the performance of open-source LMs on complex reasoning tasks.
arXiv Detail & Related papers (2024-03-14T15:20:54Z)
LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs) This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z)
Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence [0.0]
Large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences. We build on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability. The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks.
arXiv Detail & Related papers (2023-09-24T14:21:50Z)
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models [68.18370230899102]
We investigate how to elicit compositional generalization capabilities in large language models (LLMs) We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
arXiv Detail & Related papers (2023-08-01T05:54:12Z)
Learning Non-linguistic Skills without Sacrificing Linguistic Proficiency [14.618731441943847]
Non-linguistic skill injection leads to catastrophic forgetting of core linguistic skills. Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention.
arXiv Detail & Related papers (2023-05-14T20:57:11Z)
Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control. We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements. Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.