Related papers: Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

URL: http://arxiv.org/abs/2407.01082v2
Date: Sun, 13 Oct 2024 11:21:55 GMT
Title: Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
Authors: Minh Nguyen, Andrew Baker, Clement Neo, Allen Roush, Andreas Kirsch, Ravid Shwartz-Ziv,
Abstract summary: Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step. We propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability. We conduct extensive experiments on benchmarks including GPQA, GSM8K, and AlpacaEval Creative Writing, demonstrating that min-p sampling improves both the quality and diversity of generated text, particularly at high temperatures.
Score: 4.122612309805664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step. However, popular sampling methods like top-p (nucleus sampling) often struggle to balance quality and diversity, especially at higher temperatures, leading to incoherent or repetitive outputs. To address this challenge, we propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability. We conduct extensive experiments on benchmarks including GPQA, GSM8K, and AlpacaEval Creative Writing, demonstrating that min-p sampling improves both the quality and diversity of generated text, particularly at high temperatures. Moreover, human evaluations reveal a clear preference for min-p sampling in terms of both text quality and diversity. Min-p sampling has been adopted by multiple open-source LLM implementations, highlighting its practical utility and potential impact.

Related papers

Quasi-random Multi-Sample Inference for Large Language Models [1.647759094903376]
Large language models (LLMs) are often equipped with multi-sample decoding strategies. Traditional text generation methods, such as beam search and sampling-based techniques, have notable limitations. This study explores the potential of arithmetic sampling, contrasting it with ancestral sampling.
arXiv Detail & Related papers (2024-11-09T18:55:04Z)
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step. Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z)
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z)
Priority Sampling of Large Language Models for Compilers [4.2266182821287135]
Priority Sampling is a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence. It supports generation based on regular expression that provides a controllable and structured exploration process. It outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
arXiv Detail & Related papers (2024-02-28T22:27:49Z)
A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model. Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z)
An Empirical Study of Translation Hypothesis Ensembling with Large Language Models [9.068791020917217]
Large language models (LLMs) are becoming a one-fits-many solution, but they sometimes hallucinate or produce unreliable output. We investigate how hypothesis ensembling can improve the quality of the generated text.
arXiv Detail & Related papers (2023-10-17T17:40:21Z)
Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation [20.749494856466526]
We show how different sampling approaches for generating candidate lists for Minimum Bayes Risk decoding affect performance. Based on our insights into their limitations, we experiment with the recently proposed epsilon-sampling approach, which prunes away all tokens with a probability smaller than epsilon.
arXiv Detail & Related papers (2023-05-17T00:11:38Z)
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z)
Sampling with Attribute-Related Information for Controlling Language Models [86.72661027591394]
We propose a new simple guided decoding method, Gamma Sampling, which does not require complex engineering and any extra data. Gamma Sampling introduces attribute-related information into the sampling process to guide language models to generate texts with desired attributes. Experiments on controlling topics and sentiments of generated text show Gamma Sampling to be superior in diversity, attribute relevance and overall quality of generated samples.
arXiv Detail & Related papers (2022-05-12T11:48:11Z)
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation [79.98319703471596]
We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality. It builds on recently proposed plan-based neural generation models that are trained to first create a composition of the output and then generate by conditioning on it and the input.
arXiv Detail & Related papers (2022-03-28T21:24:03Z)
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples. We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.