Related papers: Semantic uncertainty in advanced decoding methods for LLM generation

Semantic uncertainty in advanced decoding methods for LLM generation

URL: http://arxiv.org/abs/2506.17296v1
Date: Tue, 17 Jun 2025 10:09:29 GMT
Title: Semantic uncertainty in advanced decoding methods for LLM generation
Authors: Darius Foodeei, Simin Fan, Martin Jaggi,
Abstract summary: This study investigates semantic uncertainty in large language model (LLM) outputs across different decoding methods.<n>We analyze how different decoding strategies affect both the diversity and reliability of model outputs.
Score: 35.31962554915952
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates semantic uncertainty in large language model (LLM) outputs across different decoding methods, focusing on emerging techniques like speculative sampling and chain-of-thought (CoT) decoding. Through experiments on question answering, summarization, and code generation tasks, we analyze how different decoding strategies affect both the diversity and reliability of model outputs. Our findings reveal that while CoT decoding demonstrates higher semantic diversity, it maintains lower predictive entropy, suggesting that structured exploration can lead to more confident and accurate outputs. This is evidenced by a 48.8% improvement in code generation Pass@2 rates, despite lower alignment with reference solutions. For summarization tasks, speculative sampling proved particularly effective, achieving superior ROUGE scores while maintaining moderate semantic diversity. Our results challenge conventional assumptions about trade-offs between diversity and accuracy in language model outputs, demonstrating that properly structured decoding methods can increase semantic exploration while maintaining or improving output quality. These findings have significant implications for deploying language models in practical applications where both reliability and diverse solution generation are crucial.

Related papers

Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z)
Diversified Sampling Improves Scaling LLM inference [31.18762591875725]
DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions.<n>Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower.
arXiv Detail & Related papers (2025-02-16T07:37:58Z)
Diversity Explains Inference Scaling Laws: Through a Case Study of Minimum Bayes Risk Decoding [32.02732402635305]
Inference methods play an important role in eliciting the performance of large language models (LLMs)<n>Currently, LLMs use inference methods utilizing generated multiple samples, which can be derived from Minimum Bayes Risk (MBR) Decoding.<n>Previous studies have conducted empirical analyses to clarify the improvements in generation performance achieved by MBR decoding.<n>We offer a new theoretical interpretation of MBR decoding from the perspective of bias-diversity decomposition.
arXiv Detail & Related papers (2024-10-19T07:32:10Z)
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.<n>Our work offers a comprehensive comparison of existing truncation sampling methods and serves as a practical user guideline for their parameter selection.
arXiv Detail & Related papers (2024-08-24T14:14:32Z)
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation [96.78845113346809]
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics to detect unfaithful sentences. We also introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation.
arXiv Detail & Related papers (2024-06-19T16:42:57Z)
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning. Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios. Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z)
A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z)
Uncertainty Awareness of Large Language Models Under Code Distribution Shifts: A Benchmark Study [14.507068647009602]
Large Language Models (LLMs) have been widely employed in programming language analysis to enhance human productivity. Their reliability can be compromised by various code distribution shifts, leading to inconsistent outputs. Probability methods are known to mitigate such impact through uncertainty calibration and estimation.
arXiv Detail & Related papers (2024-01-12T00:00:32Z)
Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z)
Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce. Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.