Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers
- URL: http://arxiv.org/abs/2602.18292v2
- Date: Wed, 25 Feb 2026 18:47:28 GMT
- Title: Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers
- Authors: Xiaotong Ji, Rasul Tutunov, Matthieu Zimmer, Haitham Bou-Ammar,
- Abstract summary: We argue decoding should be understood as a principled optimisation layer.<n>This single template recovers greedy decoding, Softmax sampling, Top-K, Top-P, and Sparsemax-style sparsity as special cases.<n>We show that such samples can improve accuracy by, for example, +18.6% for Qwen2.5-Math-7B on MATH500 at high sampling temperatures.
- Score: 14.647624238539777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off model score against structural preferences and constraints. This single template recovers greedy decoding, Softmax sampling, Top-K, Top-P, and Sparsemax-style sparsity as special cases, and explains their common structure through optimality conditions. More importantly, the framework makes it easy to invent new decoders without folklore. We demonstrate this by designing Best-of-K (BoK), a KL-anchored coverage objective aimed at multi-sample pipelines (self-consistency, reranking, verifier selection). BoK targets the probability of covering good alternatives within a fixed K-sample budget and improves empirical performance. We show that such samples can improve accuracy by, for example, +18.6% for Qwen2.5-Math-7B on MATH500 at high sampling temperatures.
Related papers
- Inference-Time Chain-of-Thought Pruning with Latent Informativeness Signals [6.5422130090856925]
Self-Truncation Best-of-N (ST-BoN) mitigates this by truncating unpromising paths early.<n>We present KL-Adjusted Pruned Path Algorithm (KAPPA), an inference-time method that combines Kullback-Leibler divergence, confidence, and entropy into a principled scoring function.
arXiv Detail & Related papers (2025-11-01T20:41:22Z) - CarBoN: Calibrated Best-of-N Sampling Improves Test-time Reasoning [62.56541355300587]
We introduce a general test-time calibration framework that adaptively modifies the model toward high-reward reasoning paths.<n>Within this framework, we propose CarBoN, a two-phase method that first explores the solution space and then learns a calibration of the logits.<n>Experiments on MATH-500 and AIME-2024 show that CarBoN improves efficiency, with up to $4times$ fewer rollouts to reach the same accuracy.
arXiv Detail & Related papers (2025-10-17T14:04:37Z) - Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling [0.14323566945483493]
Modern preference alignment techniques, such as Best-of-N sampling, rely on reward models trained with pairwise comparison data.<n>We introduce a new data collection and modeling framework to address this critical reliability gap.<n>We show that when tuned as an alignment guardrail, it reduces reliability failures by 70%, and when tuned as an inference accelerator, it improves average inference speed by over 22%.
arXiv Detail & Related papers (2025-10-05T08:23:08Z) - Foundations of Top-$k$ Decoding For Language Models [19.73575905188064]
We develop a theoretical framework that both explains and generalizes top-$k$ decoding.<n>We show how to optimize it efficiently for a large class of divergences.
arXiv Detail & Related papers (2025-05-25T23:46:34Z) - Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach.<n>As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt.<n>By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z) - Robust Conformal Prediction with a Single Binary Certificate [58.450154976190795]
Conformal prediction (CP) converts any model's output to prediction sets with a guarantee to cover the true label with (adjustable) high probability.<n>We propose a robust conformal prediction that produces smaller sets even with significantly lower MC samples.
arXiv Detail & Related papers (2025-03-07T08:41:53Z) - UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization [19.673388630963807]
We propose UniCBE, a unified uniformity-driven CBE framework.<n>On the AlpacaEval benchmark, UniCBE saves over 17% of evaluation budgets while achieving a Pearson correlation with ground truth exceeding 0.995.<n>In scenarios where new models are continuously introduced, UniCBE can even save over 50% of evaluation costs.
arXiv Detail & Related papers (2025-02-17T05:28:12Z) - InfAlign: Inference-aware language model alignment [58.66389179049758]
Language model alignment is a critical step in training modern generative language models.<n>We show that this train/test mismatch makes standard RLHF framework sub-optimal in view of inference-time methods.<n>We propose a framework for inference-aware alignment (InfAlign), which aims to optimize inference-time win rate of the aligned policy against the base model.
arXiv Detail & Related papers (2024-12-27T18:45:36Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - Conditional Poisson Stochastic Beam Search [35.60062127942947]
Conditional Poisson beam search (CPSBS) is a more natural alternative to Kool et. al. 2019's beam search (SBS)
CPSBS produces lower variance and more efficient estimators than SBS, even showing improvements in high entropy settings.
arXiv Detail & Related papers (2021-09-22T20:49:16Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.