On the Efficacy of Sampling Adapters
- URL: http://arxiv.org/abs/2307.03749v2
- Date: Fri, 5 Jan 2024 15:55:23 GMT
- Title: On the Efficacy of Sampling Adapters
- Authors: Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan
Cotterell
- Abstract summary: We propose a unified framework for understanding sampling adapters.
We argue that the shift they enforce can be viewed as a trade-off between precision and recall.
We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
- Score: 82.5941326570812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling is a common strategy for generating text from probabilistic models,
yet standard ancestral sampling often results in text that is incoherent or
ungrammatical. To alleviate this issue, various modifications to a model's
sampling distribution, such as nucleus or top-k sampling, have been introduced
and are now ubiquitously used in language generation systems. We propose a
unified framework for understanding these techniques, which we term sampling
adapters. Sampling adapters often lead to qualitatively better text, which
raises the question: From a formal perspective, how are they changing the
(sub)word-level distributions of language generation models? And why do these
local changes lead to higher-quality text? We argue that the shift they enforce
can be viewed as a trade-off between precision and recall: while the model
loses its ability to produce certain strings, its precision rate on desirable
text increases. While this trade-off is not reflected in standard metrics of
distribution quality (such as perplexity), we find that several
precision-emphasizing measures indeed indicate that sampling adapters can lead
to probability distributions more aligned with the true distribution. Further,
these measures correlate with higher sequence-level quality scores,
specifically, Mauve.
Related papers
- Non-Exchangeable Conformal Language Generation with Nearest Neighbors [12.790082627386482]
Non-exchangeable conformal nucleus sampling is a novel extension of the conformal prediction framework to generation based on nearest neighbors.
Our method can be used post-hoc for an arbitrary model without extra training and supplies token-level, calibrated prediction sets equipped with statistical guarantees.
arXiv Detail & Related papers (2024-02-01T16:04:04Z) - Principled Gradient-based Markov Chain Monte Carlo for Text Generation [77.46654898866291]
We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly.
We demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.
arXiv Detail & Related papers (2023-12-29T18:00:56Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - An Invariant Learning Characterization of Controlled Text Generation [25.033675230270212]
Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest.
We show that the performance of controlled generation may be poor if the distributions of text in response to user prompts differ from the distribution the predictor was trained on.
arXiv Detail & Related papers (2023-05-31T21:35:08Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.