A Theory for Emergence of Complex Skills in Language Models
- URL: http://arxiv.org/abs/2307.15936v2
- Date: Mon, 6 Nov 2023 00:36:24 GMT
- Title: A Theory for Emergence of Complex Skills in Language Models
- Authors: Sanjeev Arora, Anirudh Goyal
- Abstract summary: A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up.
This paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework.
- Score: 56.947273387302616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A major driver of AI products today is the fact that new skills emerge in
language models when their parameter set and training corpora are scaled up.
This phenomenon is poorly understood, and a mechanistic explanation via
mathematical analysis of gradient-based training seems difficult. The current
paper takes a different approach, analysing emergence using the famous (and
empirical) Scaling Laws of LLMs and a simple statistical framework.
Contributions include: (a) A statistical framework that relates cross-entropy
loss of LLMs to competence on the basic skills that underlie language tasks.
(b) Mathematical analysis showing that the Scaling Laws imply a strong form of
inductive bias that allows the pre-trained model to learn very efficiently. We
informally call this {\em slingshot generalization} since naively viewed it
appears to give competence levels at skills that violate usual generalization
theory. (c) A key example of slingshot generalization, that competence at
executing tasks involving $k$-tuples of skills emerges essentially at the same
scaling and same rate as competence on the elementary skills themselves.
Related papers
- Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We investigate the interplay between generalization and memorization in large language models at scale.
With various sizes of open-source LLMs and their pretraining corpora, we observe that as the model size increases, the task-relevant $n$-gram pair data becomes increasingly important.
Our results support the hypothesis that LLMs' capabilities emerge from a delicate balance of memorization and generalization with sufficient task-related pretraining data.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - A Mathematical Theory for Learning Semantic Languages by Abstract Learners [9.139188656944429]
We develop a mathematical theory to explain the emergence of learned skills, taking the learning process into account.
We demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold.
We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph.
arXiv Detail & Related papers (2024-04-10T13:50:46Z) - Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks [40.7766635942194]
We propose a probing framework to investigate whether the atomic skill can spontaneously generalize to complex reasoning tasks.
We then introduce a hierarchical curriculum learning training strategy to achieve better skill generalization.
By leveraging hierarchical curriculum learning, we successfully induce generalization, significantly improve the performance of open-source LMs on complex reasoning tasks.
arXiv Detail & Related papers (2024-03-14T15:20:54Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Machine-assisted mixed methods: augmenting humanities and social
sciences with artificial intelligence [0.0]
The increasing capacities of large language models (LLMs) present an unprecedented opportunity to scale up data analytics in the humanities and social sciences.
This contribution proposes a systematic mixed methods framework to harness qualitative analytic expertise and machine scalability.
Tasks include linguistic and discourse analysis, lexical semantic change detection, interview analysis, historical event cause inference and text mining.
arXiv Detail & Related papers (2023-09-24T14:21:50Z) - Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models [68.18370230899102]
We investigate how to elicit compositional generalization capabilities in large language models (LLMs)
We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial.
We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
arXiv Detail & Related papers (2023-08-01T05:54:12Z) - Learning Non-linguistic Skills without Sacrificing Linguistic
Proficiency [14.618731441943847]
Non-linguistic skill injection leads to catastrophic forgetting of core linguistic skills.
Our model outperforms the state-of-the-art both on injected non-linguistic skills and on linguistic knowledge retention.
arXiv Detail & Related papers (2023-05-14T20:57:11Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.