Related papers: A Simple Explanation for the Phase Transition in Large Language Models with List Decoding

A Simple Explanation for the Phase Transition in Large Language Models with List Decoding

URL: http://arxiv.org/abs/2303.13112v1
Date: Thu, 23 Mar 2023 09:00:07 GMT
Title: A Simple Explanation for the Phase Transition in Large Language Models with List Decoding
Authors: Cheng-Shang Chang
Abstract summary: We show that large language models (LLM) exhibit emergent abilities that are not present in small models. We use a list decoder that keeps a list of candidate sequences at each step and defers the generation of the output sequence at the end.
Score: 3.898689841227059
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Various recent experimental results show that large language models (LLM) exhibit emergent abilities that are not present in small models. System performance is greatly improved after passing a certain critical threshold of scale. In this letter, we provide a simple explanation for such a phase transition phenomenon. For this, we model an LLM as a sequence-to-sequence random function. Instead of using instant generation at each step, we use a list decoder that keeps a list of candidate sequences at each step and defers the generation of the output sequence at the end. We show that there is a critical threshold such that the expected number of erroneous candidate sequences remains bounded when an LLM is below the threshold, and it grows exponentially when an LLM is above the threshold. Such a threshold is related to the basic reproduction number in a contagious disease.

Related papers

Real-time Verification and Refinement of Language Model Text Generation [60.04718679054704]
Large language models (LLMs) have shown remarkable performance across a wide range of natural language tasks. A critical challenge remains in that they sometimes generate factually incorrect answers. We propose Streaming-VR, a novel approach designed to enhance the efficiency of verification and refinement of LLM outputs.
arXiv Detail & Related papers (2025-01-14T03:59:48Z)
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation [15.904856111636851]
This paper introduces the counter-intuitive generalization results of overfitting pre-trained large language models on very small datasets. We find that by further fine-tuning these models to achieve a near-zero training loss on a small set of samples, the long-sequence generative capabilities are greatly enhanced.
arXiv Detail & Related papers (2024-12-05T16:34:20Z)
Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models. We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses. We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z)
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z)
Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem [18.020492646988746]
We present Set-Based Prompting, a technique that guarantees the output of an LLM will not have order dependence on a specified set of sub-sequences. Despite our inputs being out of distribution, the impact on expected accuracy is small, where the expectation is over the order of uniformly chosen shuffling of the candidate responses.
arXiv Detail & Related papers (2024-06-04T16:09:13Z)
σ-GPTs: A New Approach to Autoregressive Models [19.84252724050016]
We show that by simply adding a positional encoding for the output, this order can be modulated on-the-fly per-sample. We evaluate our method across various domains, including language modeling, path-solving, and aircraft vertical rate prediction.
arXiv Detail & Related papers (2024-04-15T08:22:47Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Diffusion-LM Improves Controllable Text Generation [80.50044830018442]
Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. We develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.
arXiv Detail & Related papers (2022-05-27T20:12:09Z)
Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines. In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
Adversarial Encoder-Multi-Task-Decoder for Multi-Stage Processes [5.933303832684138]
In multi-stage processes, decisions occur in an ordered sequence of stages. We introduce a framework that combines adversarial autoencoders (AAE), multi-task learning (MTL), and multi-label semi-supervised learning (MLSSL) Using real-world data from different domains, we show that our approach outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-03-15T19:30:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.