On the Efficacy of Sampling Adapters
- URL: http://arxiv.org/abs/2307.03749v2
- Date: Fri, 5 Jan 2024 15:55:23 GMT
- Title: On the Efficacy of Sampling Adapters
- Authors: Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan
Cotterell
- Abstract summary: We propose a unified framework for understanding sampling adapters.
We argue that the shift they enforce can be viewed as a trade-off between precision and recall.
We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
- Score: 82.5941326570812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling is a common strategy for generating text from probabilistic models,
yet standard ancestral sampling often results in text that is incoherent or
ungrammatical. To alleviate this issue, various modifications to a model's
sampling distribution, such as nucleus or top-k sampling, have been introduced
and are now ubiquitously used in language generation systems. We propose a
unified framework for understanding these techniques, which we term sampling
adapters. Sampling adapters often lead to qualitatively better text, which
raises the question: From a formal perspective, how are they changing the
(sub)word-level distributions of language generation models? And why do these
local changes lead to higher-quality text? We argue that the shift they enforce
can be viewed as a trade-off between precision and recall: while the model
loses its ability to produce certain strings, its precision rate on desirable
text increases. While this trade-off is not reflected in standard metrics of
distribution quality (such as perplexity), we find that several
precision-emphasizing measures indeed indicate that sampling adapters can lead
to probability distributions more aligned with the true distribution. Further,
these measures correlate with higher sequence-level quality scores,
specifically, Mauve.
Related papers
- A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
Given a general language model and its aligned version, there exists a trade-off between the average reward and average log-likelihood of the strings under the general language model.
We provide a formal treatment of this issue and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - Principled Gradient-based Markov Chain Monte Carlo for Text Generation [77.46654898866291]
We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly.
We demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.
arXiv Detail & Related papers (2023-12-29T18:00:56Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - An Invariant Learning Characterization of Controlled Text Generation [25.033675230270212]
Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest.
We show that the performance of controlled generation may be poor if the distributions of text in response to user prompts differ from the distribution the predictor was trained on.
arXiv Detail & Related papers (2023-05-31T21:35:08Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.