Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs
- URL: http://arxiv.org/abs/2510.01218v1
- Date: Sat, 20 Sep 2025 15:16:27 GMT
- Title: Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs
- Authors: Sergey Troshin, Wafaa Mohammed, Yan Meng, Christof Monz, Antske Fokkens, Vlad Niculae,
- Abstract summary: Temperature-based sampling is a common strategy to increase diversity.<n>But uncontrolled high temperature sampling, e.g., min-$p$ or top-$p$, degrades reasoning quality.<n>We propose textbfselective sampling, a method that switches between greedy and high-temperature sampling.
- Score: 26.477037145228735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diversity is an essential metric for evaluating the creativity of outputs generated by language models. Temperature-based sampling is a common strategy to increase diversity. However, for tasks that require high precision, e.g., mathematical reasoning, uncontrolled high temperature sampling, e.g., min-$p$ or top-$p$, degrades reasoning quality. We demonstrate that the loss of accuracy is caused by sampling incorrect continuations in sensitive decoding positions. To address this, in this paper, we propose \textbf{selective sampling}, a method that dynamically switches between greedy and high-temperature sampling based on a sampling risk metric. This risk metric estimates the likelihood of output errors when applying high-temperature sampling on the current token position. To predict sampling risk, we train a lightweight classifier on a small subset of verifiable problems. The trained classifier can be integrated with the base language model with minimal latency overhead. Experiments on mathematical reasoning tasks demonstrate that selective sampling enhances the quality-diversity trade-off, even in high-temperature settings.
Related papers
- Combating Noisy Labels through Fostering Self- and Neighbor-Consistency [120.4394402099635]
Label noise is pervasive in various real-world scenarios, posing challenges in supervised deep learning.<n>We propose a noise-robust method named Jo-SNC (textbfJoint sample selection and model regularization based on textbfSelf- and textbfNeighbor-textbfConsistency)<n>We design a self-adaptive, data-driven thresholding scheme to adjust per-class selection thresholds.
arXiv Detail & Related papers (2026-01-19T07:55:29Z) - Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning [29.277754405630205]
Reinforcement learning with verifiable rewards (RLVR) is a powerful paradigm for enhancing the reasoning capabilities of large language models (LLMs)<n>Standard fixed-temperature sampling is simple, but it struggles to balance these competing demands, as high temperatures degrade sample quality and low temperatures limit discovery.<n>We propose a simpler and more effective strategy, Exploratory Annealed Decoding (EAD), grounded in the insight that exploration is most impactful on early tokens.
arXiv Detail & Related papers (2025-10-06T18:15:43Z) - $p$-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding [10.595336643423229]
$p$-less sampling is an information-theoretic approach to sampling which dynamically sets a truncation threshold at each decoding step.<n>We show how $p$-less consistently outperforms existing sampling approaches while exhibiting much less degradation in text quality at higher temperature values.
arXiv Detail & Related papers (2025-09-27T10:33:41Z) - Optimizing Temperature for Language Models with Multi-Sample Inference [47.14991144052361]
This paper addresses the challenge of automatically identifying the (near)-optimal temperature for different large language models.<n>We provide a comprehensive analysis of temperature's role in performance optimization, considering variations in model architectures, datasets, task types, model sizes, and predictive accuracy.<n>We propose a novel entropy-based metric for automated temperature optimization, which consistently outperforms fixed-temperature baselines.
arXiv Detail & Related papers (2025-02-07T19:35:25Z) - Adaptive Decoding via Latent Preference Optimization [55.70602730588745]
We introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time.
Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures.
arXiv Detail & Related papers (2024-11-14T18:31:39Z) - Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs [3.631341123338476]
Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step.<n>We propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by using the top token's probability as a scaling factor.
arXiv Detail & Related papers (2024-07-01T08:37:25Z) - REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - Risk-Sensitive Diffusion: Robustly Optimizing Diffusion Models with Noisy Samples [58.68233326265417]
Non-image data are prevalent in real applications and tend to be noisy.
Risk-sensitive SDE is a type of differential equation (SDE) parameterized by the risk vector.
We conduct systematic studies for both Gaussian and non-Gaussian noise distributions.
arXiv Detail & Related papers (2024-02-03T08:41:51Z) - Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs)
Standard conformal prediction produces prediction sets with rigorous, statistical guarantees.
We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z) - KL-Divergence Guided Temperature Sampling [5.726259957909055]
As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations.
One common approach to mitigate hallucinations is to provide source/grounding documents.
We propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source.
arXiv Detail & Related papers (2023-06-02T06:11:26Z) - Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models [65.52639709094963]
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model.
arXiv Detail & Related papers (2022-10-18T22:19:41Z) - Selectively increasing the diversity of GAN-generated samples [8.980453507536017]
We propose a novel method to selectively increase the diversity of GAN-generated samples.
We show the superiority of our method in a synthetic benchmark as well as a real-life scenario simulating data from the Zero Degree Calorimeter of ALICE experiment in CERN.
arXiv Detail & Related papers (2022-07-04T16:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.