Related papers: p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding

p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding

URL: http://arxiv.org/abs/2509.23234v4
Date: Tue, 28 Oct 2025 20:33:49 GMT
Title: p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding
Authors: Runyan Tan, Shuang Wu, Phillip Howard,
Abstract summary: $p$-less sampling is an information-theoretic approach to sampling.<n>It dynamically sets a truncation threshold at each decoding step based on the entire token probability distribution.<n>It consistently produces high-quality outputs as temperature increases.
Score: 10.595336643423229
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Obtaining high-quality outputs from Large Language Models (LLMs) often depends upon the choice of a sampling-based decoding strategy to probabilistically choose the next token at each generation step. While a variety of such sampling methods have been proposed, their performance can be sensitive to the selection of hyperparameters which may require different settings depending upon the generation task and temperature configuration. In this work, we introduce $p$-less sampling: an information-theoretic approach to sampling which dynamically sets a truncation threshold at each decoding step based on the entire token probability distribution. Unlike existing methods, $p$-less sampling has no hyperparameters and consistently produces high-quality outputs as temperature increases. We provide theoretical perspectives on $p$-less sampling to ground our proposed method and conduct experiments to empirically validate its effectiveness across a range of math, logical reasoning, and creative writing tasks. Our results demonstrate how $p$-less sampling consistently outperforms existing sampling approaches while exhibiting much less degradation in text quality at higher temperature values. We further show how $p$-less achieves greater inference-time efficiency than alternative methods through lower average token sampling times and shorter generation lengths, without sacrificing accuracy. Finally, we provide analyses to highlight the benefits of $p$-less through qualitative examples, case studies, and diversity assessments. The code is available at https://github.com/ryttry/p-less .

Related papers

Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models [17.37935640125399]
We propose a training free, low cost intervention to enhance generative diversity in Diffusion Language Models.<n>Our approach modifies intermediate samples in a batch sequentially, where each sample is repelled from the feature space of previous samples.<n>Unlike prior methods that require retraining or beam search, our strategy incurs negligible computational overhead.
arXiv Detail & Related papers (2026-03-05T07:35:07Z)
Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs [26.477037145228735]
Temperature-based sampling is a common strategy to increase diversity.<n>But uncontrolled high temperature sampling, e.g., min-$p$ or top-$p$, degrades reasoning quality.<n>We propose textbfselective sampling, a method that switches between greedy and high-temperature sampling.
arXiv Detail & Related papers (2025-09-20T15:16:27Z)
FastMCTS: A Simple Sampling Strategy for Data Synthesis [67.60823802317141]
We introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search.<n>FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals.<n>Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30% more correct reasoning paths.
arXiv Detail & Related papers (2025-02-17T06:27:57Z)
Top-$nσ$: Not All Logits Are You Need [25.133593066927794]
We introduce top-$nsigma$, a novel sampling method that operates directly on pre-softmax logits. We show that top-$nsigma$ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-$nsigma$ to better understand its behavior.
arXiv Detail & Related papers (2024-11-12T08:46:43Z)
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation [60.493180081319785]
We propose a systematic way to estimate the capacity of a truncation sampling method by considering the trade-off between diversity and risk at each decoding step.<n>Our work offers a comprehensive comparison of existing truncation sampling methods and serves as a practical user guideline for their parameter selection.
arXiv Detail & Related papers (2024-08-24T14:14:32Z)
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs [3.631341123338476]
Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step.<n>We propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by using the top token's probability as a scaling factor.
arXiv Detail & Related papers (2024-07-01T08:37:25Z)
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z)
Optimal Budgeted Rejection Sampling for Generative Models [54.050498411883495]
Rejection sampling methods have been proposed to improve the performance of discriminator-based generative models. We first propose an Optimal Budgeted Rejection Sampling scheme that is provably optimal. Second, we propose an end-to-end method that incorporates the sampling scheme into the training procedure to further enhance the model's overall performance.
arXiv Detail & Related papers (2023-11-01T11:52:41Z)
Improved Active Learning via Dependent Leverage Score Sampling [8.400581768343804]
We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting. We propose an easily implemented method based on the emphpivotal sampling algorithm In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to $50%$.
arXiv Detail & Related papers (2023-10-08T01:51:30Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.