A Simple Explanation for the Phase Transition in Large Language Models
with List Decoding
- URL: http://arxiv.org/abs/2303.13112v1
- Date: Thu, 23 Mar 2023 09:00:07 GMT
- Title: A Simple Explanation for the Phase Transition in Large Language Models
with List Decoding
- Authors: Cheng-Shang Chang
- Abstract summary: We show that large language models (LLM) exhibit emergent abilities that are not present in small models.
We use a list decoder that keeps a list of candidate sequences at each step and defers the generation of the output sequence at the end.
- Score: 3.898689841227059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various recent experimental results show that large language models (LLM)
exhibit emergent abilities that are not present in small models. System
performance is greatly improved after passing a certain critical threshold of
scale. In this letter, we provide a simple explanation for such a phase
transition phenomenon. For this, we model an LLM as a sequence-to-sequence
random function. Instead of using instant generation at each step, we use a
list decoder that keeps a list of candidate sequences at each step and defers
the generation of the output sequence at the end. We show that there is a
critical threshold such that the expected number of erroneous candidate
sequences remains bounded when an LLM is below the threshold, and it grows
exponentially when an LLM is above the threshold. Such a threshold is related
to the basic reproduction number in a contagious disease.
Related papers
- Real-time Verification and Refinement of Language Model Text Generation [60.04718679054704]
Large language models (LLMs) have shown remarkable performance across a wide range of natural language tasks.
A critical challenge remains in that they sometimes generate factually incorrect answers.
We propose Streaming-VR, a novel approach designed to enhance the efficiency of verification and refinement of LLM outputs.
arXiv Detail & Related papers (2025-01-14T03:59:48Z) - The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation [15.904856111636851]
This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models on very small datasets.
We find that by further fine-tuning these models to achieve a near-zero training loss on a small set of samples, the long-sequence generative capabilities are greatly enhanced.
arXiv Detail & Related papers (2024-12-05T16:34:20Z) - REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - Order-Independence Without Fine Tuning [18.020492646988746]
We present Set-Based Prompting, a technique that guarantees the output of an LLM will not have order dependence on a specified set of sub-sequences.
Despite our inputs being out of distribution, the impact on expected accuracy is small, where the expectation is over the order of uniformly chosen shuffling of the candidate responses.
arXiv Detail & Related papers (2024-06-04T16:09:13Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Diffusion-LM Improves Controllable Text Generation [80.50044830018442]
Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation.
We develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM.
We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.
arXiv Detail & Related papers (2022-05-27T20:12:09Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes [5.933303832684138]
In multi-stage processes, decisions occur in an ordered sequence of stages.
We introduce a framework that combines adversarial autoencoders (AAE), multi-task learning (MTL), and multi-label semi-supervised learning (MLSSL)
Using real-world data from different domains, we show that our approach outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-03-15T19:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.