Related papers: Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

URL: http://arxiv.org/abs/2509.02510v1
Date: Tue, 02 Sep 2025 17:02:29 GMT
Title: Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation
Authors: Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram,
Abstract summary: We present top-H decoding, a greedy algorithm to solve the ECMM problem.<n>We show that top-H outperforms the state-of-the-art (SoTA) alternative of min-$p$ sampling by up to **25.63%** on creative writing.<n>In summary, top-H advances SoTA in open-ended text generation and can be integrated* into creative writing applications.
Score: 12.183451602438753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\$p\$ (nucleus) sampling, and min-\$p\$ sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-\$p\$ sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-\$p\$ sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at https://github.com/ErfanBaghaei/Top-H-Decoding.

Related papers

Entropy-Aligned Decoding of LMs for Better Writing and Reasoning [21.971790771470324]
Language models (LMs) are trained on billions of tokens in an attempt to recover the true language distribution.<n>Currently, vanilla random sampling from LMs yields low quality generations.<n>We introduce EPIC, a hyper- parameter-free decoding approach that incorporates the entropy of future trajectories into LM decoding.
arXiv Detail & Related papers (2026-01-05T01:37:10Z)
The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations [33.65540900920885]
Estimating the difficulty of input questions as perceived by large language models (LLMs) is essential for accurate performance evaluation and adaptive inference.<n>We propose a novel approach for difficulty estimation that leverages only the hidden representations produced by the target LLM.
arXiv Detail & Related papers (2025-09-16T09:38:41Z)
GUARD: Glocal Uncertainty-Aware Robust Decoding for Effective and Efficient Open-Ended Text Generation [7.799544459641742]
GUARD is a self-adaptive decoding method that balances coherence with diversity in open-ended text generation.<n>We show that GUARD achieves a good balance between text diversity and coherence, while exhibiting substantial improvements in generation speed.
arXiv Detail & Related papers (2025-08-28T13:14:20Z)
An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference [29.49615352723995]
Mixture-of-Experts (MoE) models activate only a small subset of relevant experts per input.<n>The sheer number of expert networks in an MoE model introduces a significant storage burden for an edge device.<n>We propose a greedy decomposition method to decompose the original problem into a series of subproblems.
arXiv Detail & Related papers (2025-07-09T05:43:43Z)
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization [90.15027447565427]
Chain of thought (CoT) generates free-text explanations that help guide a model's predictions.<n>Self-Consistency (SC) marginalizes predictions over multiple generated explanations.<n>We propose $textbfC$hain-$textbfo$f-$textbfKe$ywords (CoKe)
arXiv Detail & Related papers (2025-03-21T13:37:46Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.<n>We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Top-$nσ$: Not All Logits Are You Need [25.133593066927794]
We introduce top-$nsigma$, a novel sampling method that operates directly on pre-softmax logits. We show that top-$nsigma$ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-$nsigma$ to better understand its behavior.
arXiv Detail & Related papers (2024-11-12T08:46:43Z)
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs [3.631341123338476]
Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step.<n>We propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by using the top token's probability as a scaling factor.
arXiv Detail & Related papers (2024-07-01T08:37:25Z)
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs) Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.