Related papers: On the Efficacy of Sampling Adapters

On the Efficacy of Sampling Adapters

URL: http://arxiv.org/abs/2307.03749v2
Date: Fri, 5 Jan 2024 15:55:23 GMT
Title: On the Efficacy of Sampling Adapters
Authors: Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan Cotterell
Abstract summary: We propose a unified framework for understanding sampling adapters. We argue that the shift they enforce can be viewed as a trade-off between precision and recall. We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
Score: 82.5941326570812
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, such as nucleus or top-k sampling, have been introduced and are now ubiquitously used in language generation systems. We propose a unified framework for understanding these techniques, which we term sampling adapters. Sampling adapters often lead to qualitatively better text, which raises the question: From a formal perspective, how are they changing the (sub)word-level distributions of language generation models? And why do these local changes lead to higher-quality text? We argue that the shift they enforce can be viewed as a trade-off between precision and recall: while the model loses its ability to produce certain strings, its precision rate on desirable text increases. While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution. Further, these measures correlate with higher sequence-level quality scores, specifically, Mauve.

Related papers

Confidence Regularized Masked Language Modeling using Text Length [0.0]
Masked language modeling is a widely used method for learning language representations, where the model predicts a randomly masked word in each input. This issue becomes more pronounced when the input text is short, as the possible word distribution tends to have higher entropy, potentially causing the model to become overconfident in its predictions. We propose a novel confidence regularizer that adaptively adjusts the regularization strength based on the input length. Experiments on the GLUE and SQuAD benchmarks show that our method improves both accuracy and expected calibration error.
arXiv Detail & Related papers (2025-04-08T13:37:08Z)
Non-Exchangeable Conformal Language Generation with Nearest Neighbors [12.790082627386482]
Non-exchangeable conformal nucleus sampling is a novel extension of the conformal prediction framework to generation based on nearest neighbors. Our method can be used post-hoc for an arbitrary model without extra training and supplies token-level, calibrated prediction sets equipped with statistical guarantees.
arXiv Detail & Related papers (2024-02-01T16:04:04Z)
Principled Gradient-based Markov Chain Monte Carlo for Text Generation [77.46654898866291]
We propose several faithful gradient-based sampling algorithms to sample from the target energy-based text distribution correctly. We demonstrate that faithful samplers are able to generate more fluent text while adhering to the control objectives better.
arXiv Detail & Related papers (2023-12-29T18:00:56Z)
A Block Metropolis-Hastings Sampler for Controllable Energy-based Text Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model. Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z)
An Invariant Learning Characterization of Controlled Text Generation [25.033675230270212]
Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. We show that the performance of controlled generation may be poor if the distributions of text in response to user prompts differ from the distribution the predictor was trained on.
arXiv Detail & Related papers (2023-05-31T21:35:08Z)
Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. We develop practical bounds to apply it to language generation. We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z)
Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive. We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z)
On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation. We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities. Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.