Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
- URL: http://arxiv.org/abs/2210.15458v2
- Date: Thu, 1 Jun 2023 16:18:51 GMT
- Title: Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
- Authors: Luke Vilnis, Yury Zemlyanskiy, Patrick Murray, Alexandre Passos, Sumit
Sanghai
- Abstract summary: Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
- Score: 65.52639709094963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decoding methods for large language models often trade-off between diversity
of outputs and parallelism of computation. Methods such as beam search and
Gumbel top-k sampling can guarantee a different output for each element of the
beam, but are not easy to parallelize. Alternatively, methods such as
temperature sampling and its modifications (top-k sampling, nucleus sampling,
typical decoding, and others), are embarrassingly parallel, but have no
guarantees about duplicate samples. We present a framework for sampling
according to an arithmetic code book implicitly defined by a large language
model, compatible with common sampling variations, with provable beam diversity
under certain conditions, as well as being embarrassingly parallel and
providing unbiased and consistent expectations from the original model. We
demonstrate the effectiveness of our approach on WMT machine translation, more
than halving the standard deviation when estimating expected BLEU score reward,
and closing the BLEU score gap between independent sampling and beam search by
up to 63%.
Related papers
- Quasi-random Multi-Sample Inference for Large Language Models [1.647759094903376]
Large language models (LLMs) are often equipped with multi-sample decoding strategies.
Traditional text generation methods, such as beam search and sampling-based techniques, have notable limitations.
This study explores the potential of arithmetic sampling, contrasting it with ancestral sampling.
arXiv Detail & Related papers (2024-11-09T18:55:04Z) - Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference [35.730941605490194]
Large language models (LLMs) have shown outstanding performance across numerous real-world tasks.
Speculative decoding has emerged as a promising solution, leveraging a smaller auxiliary model to draft future tokens.
This paper explores the novel integration of speculative decoding with beam sampling.
arXiv Detail & Related papers (2024-09-25T02:20:42Z) - Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the intrinsic capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.
Our work provides a comprehensive comparison between existing truncation sampling methods, as well as their recommended parameters as a guideline for users.
arXiv Detail & Related papers (2024-08-24T14:14:32Z) - Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs [4.122612309805664]
Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step.
We propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability.
We conduct extensive experiments on benchmarks including GPQA, GSM8K, and AlpacaEval Creative Writing, demonstrating that min-p sampling improves both the quality and diversity of generated text, particularly at high temperatures.
arXiv Detail & Related papers (2024-07-01T08:37:25Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs)
Standard conformal prediction produces prediction sets with rigorous, statistical guarantees.
We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z) - Structured Voronoi Sampling [61.629198273926676]
In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods.
We name our gradient-based technique Structured Voronoi Sampling (SVS)
In a controlled generation task, SVS is able to generate fluent and diverse samples while following the control targets significantly better than other methods.
arXiv Detail & Related papers (2023-06-05T17:32:35Z) - MacLaSa: Multi-Aspect Controllable Text Generation via Efficient
Sampling from Compact Latent Space [110.85888003111653]
Multi-aspect controllable text generation aims to generate fluent sentences that possess multiple desired attributes simultaneously.
We introduce a novel approach for multi-aspect control, namely MacLaSa, that estimates compact latent space for multiple aspects.
We show that MacLaSa outperforms several strong baselines on attribute relevance and textual quality while maintaining a high inference speed.
arXiv Detail & Related papers (2023-05-22T07:30:35Z) - Ensemble Slice Sampling: Parallel, black-box and gradient-free inference
for correlated & multimodal distributions [0.0]
Slice Sampling has emerged as a powerful Markov Chain Monte Carlo algorithm that adapts to the characteristics of the target distribution with minimal hand-tuning.
This paper introduces Ensemble Slice Sampling (ESS), a new class of algorithms that bypasses such difficulties by adaptively tuning the initial length scale.
These affine-invariant algorithms are trivial to construct, require no hand-tuning, and can easily be implemented in parallel computing environments.
arXiv Detail & Related papers (2020-02-14T19:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.