A Theory for Emergence of Complex Skills in Language Models
- URL: http://arxiv.org/abs/2307.15936v2
- Date: Mon, 6 Nov 2023 00:36:24 GMT
- Title: A Theory for Emergence of Complex Skills in Language Models
- Authors: Sanjeev Arora, Anirudh Goyal
- Abstract summary: A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up.
This paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework.
- Score: 56.947273387302616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A major driver of AI products today is the fact that new skills emerge in
language models when their parameter set and training corpora are scaled up.
This phenomenon is poorly understood, and a mechanistic explanation via
mathematical analysis of gradient-based training seems difficult. The current
paper takes a different approach, analysing emergence using the famous (and
empirical) Scaling Laws of LLMs and a simple statistical framework.
Contributions include: (a) A statistical framework that relates cross-entropy
loss of LLMs to competence on the basic skills that underlie language tasks.
(b) Mathematical analysis showing that the Scaling Laws imply a strong form of
inductive bias that allows the pre-trained model to learn very efficiently. We
informally call this {\em slingshot generalization} since naively viewed it
appears to give competence levels at skills that violate usual generalization
theory. (c) A key example of slingshot generalization, that competence at
executing tasks involving $k$-tuples of skills emerges essentially at the same
scaling and same rate as competence on the elementary skills themselves.
Related papers
- Latent-Predictive Empowerment: Measuring Empowerment without a Simulator [56.53777237504011]
We present Latent-Predictive Empowerment (LPE), an algorithm that can compute empowerment in a more practical manner.
LPE learns large skillsets by maximizing an objective that is a principled replacement for the mutual information between skills and states.
arXiv Detail & Related papers (2024-10-15T00:41:18Z) - The Foundations of Tokenization: Statistical and Computational Concerns [51.370165245628975]
Tokenization is a critical step in the NLP pipeline.
Despite its recognized importance as a standard representation method in NLP, the theoretical underpinnings of tokenization are not yet fully understood.
The present paper contributes to addressing this theoretical gap by proposing a unified formal framework for representing and analyzing tokenizer models.
arXiv Detail & Related papers (2024-07-16T11:12:28Z) - A Mathematical Theory for Learning Semantic Languages by Abstract Learners [9.139188656944429]
We develop a mathematical theory to explain the emergence of learned skills, taking the learning process into account.
We demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold.
We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph.
arXiv Detail & Related papers (2024-04-10T13:50:46Z) - Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks [40.7766635942194]
We propose a probing framework to investigate whether the atomic skill can spontaneously generalize to complex reasoning tasks.
We then introduce a hierarchical curriculum learning training strategy to achieve better skill generalization.
By leveraging hierarchical curriculum learning, we successfully induce generalization, significantly improve the performance of open-source LMs on complex reasoning tasks.
arXiv Detail & Related papers (2024-03-14T15:20:54Z) - LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs)
This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z) - Machine-assisted quantitizing designs: augmenting humanities and social sciences with artificial intelligence [0.0]
Large language models (LLMs) have been shown to present an unprecedented opportunity to scale up data analytics in the humanities and social sciences.
We build on mixed methods quantitizing and converting design principles, and feature analysis from linguistics, to transparently integrate human expertise and machine scalability.
The approach is discussed and demonstrated in over a dozen LLM-assisted case studies, covering 9 diverse languages, multiple disciplines and tasks.
arXiv Detail & Related papers (2023-09-24T14:21:50Z) - Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models [68.18370230899102]
We investigate how to elicit compositional generalization capabilities in large language models (LLMs)
We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial.
We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
arXiv Detail & Related papers (2023-08-01T05:54:12Z) - Learning Non-linguistic Skills without Sacrificing Linguistic
Proficiency [14.618731441943847]
Non-linguistic skill injection leads to catastrophic forgetting of core linguistic skills.
Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention.
arXiv Detail & Related papers (2023-05-14T20:57:11Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.