Related papers: DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation

URL: http://arxiv.org/abs/2502.14037v3
Date: Mon, 04 Aug 2025 06:13:07 GMT
Title: DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation
Authors: Giorgio Franceschelli, Mirco Musolesi,
Abstract summary: We propose DiffSampling, a new decoding method that leverages a mathematical analysis of the token probability distribution.<n>Experiments involving four different text-generation tasks demonstrate that our approach consistently performs at least on par with the existing methods.
Score: 2.4555276449137042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite their growing capabilities, language models still frequently reproduce content from their training data, generate repetitive text, and favor common grammatical patterns and vocabulary. A possible cause is the decoding strategy: the most common strategies either consider only the most probable tokens, which reduces output diversity, or increase the likelihood of unlikely tokens, compromising output accuracy and correctness. In this paper, we propose DiffSampling, a new decoding method that leverages a mathematical analysis of the token probability distribution to ensure the generation of contextually appropriate text. In particular, the difference between consecutive, sorted probabilities can be used to truncate incorrect tokens. In addition, we also propose two variations of the proposed method that aim to correct the subtle inconsistencies of common sampling strategies. Experiments involving four different text-generation tasks demonstrate that our approach consistently performs at least on par with the existing methods it builds upon in terms of quality, while potentially improving output diversity.

Related papers

Semantic uncertainty in advanced decoding methods for LLM generation [35.31962554915952]
This study investigates semantic uncertainty in large language model (LLM) outputs across different decoding methods.<n>We analyze how different decoding strategies affect both the diversity and reliability of model outputs.
arXiv Detail & Related papers (2025-06-17T10:09:29Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition [5.575078692353885]
We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy.<n>By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved model that predicts multiple tokens simultaneously.
arXiv Detail & Related papers (2024-10-23T11:06:36Z)
Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation [0.20971479389679337]
We introduce adaptive contrastive search, a novel decoding strategy extending contrastive search. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets.
arXiv Detail & Related papers (2024-07-26T12:23:54Z)
How to Compute the Probability of a Word [45.23856093235994]
This paper derives the correct methods for computing word probabilities. We show that correcting the widespread bug in probability computations affects measured outcomes in sentence comprehension and lexical optimisation analyses.
arXiv Detail & Related papers (2024-06-20T17:59:42Z)
Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.<n>We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.<n> SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z)
Closing the Curious Case of Neural Text Degeneration [91.22954750742183]
We provide a theoretical explanation for the effectiveness of the truncation sampling. We show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability. Our evaluations show that our method outperforms its threshold-based counterparts for low-entropy text generation.
arXiv Detail & Related papers (2023-10-02T23:16:25Z)
Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects. In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts. We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z)
On the Efficacy of Sampling Adapters [82.5941326570812]
We propose a unified framework for understanding sampling adapters. We argue that the shift they enforce can be viewed as a trade-off between precision and recall. We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
arXiv Detail & Related papers (2023-07-07T17:59:12Z)
Look-back Decoding for Open-Ended Text Generation [62.53302138266465]
We propose Look-back, an improved decoding algorithm that tracks the distribution distance between current and historical decoding steps. Look-back can automatically predict potential repetitive phrase and topic drift, and remove tokens that may cause the failure modes. We perform decoding experiments on document continuation and story generation, and demonstrate that Look-back is able to generate more fluent and coherent text.
arXiv Detail & Related papers (2023-05-22T20:42:37Z)
On Decoding Strategies for Neural Text Generators [73.48162198041884]
We study the interaction between language generation tasks and decoding strategies. We measure changes in attributes of generated text as a function of both decoding strategy and task. Our results reveal both previously-observed and surprising findings.
arXiv Detail & Related papers (2022-03-29T16:25:30Z)
A Contrastive Framework for Neural Text Generation [46.845997620234265]
We show that an underlying reason for model degeneration is the anisotropic distribution of token representations. We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text.
arXiv Detail & Related papers (2022-02-13T21:46:14Z)
Neural Text Generation with Part-of-Speech Guided Softmax [82.63394952538292]
We propose using linguistic annotation, i.e., part-of-speech (POS), to guide the text generation. We show that our proposed methods can generate more diverse text while maintaining comparable quality.
arXiv Detail & Related papers (2021-05-08T08:53:16Z)
Improving Diversity of Neural Text Generation via Inverse Probability Weighting [43.36560720793425]
We propose a sampling method inspired by inverse probability weighting. We show might contain tedious or even repetitive candidates with high probability that lead to repetition loops. Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.
arXiv Detail & Related papers (2021-03-13T08:17:40Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.