Related papers: Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning

Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning

URL: http://arxiv.org/abs/2602.01745v1
Date: Mon, 02 Feb 2026 07:27:19 GMT
Title: Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning
Authors: Wenhao Yu, Shaohang Wei, Jiahong Liu, Yifan Li, Minda Hu, Aiwei Liu, Hao Zhang, Irwin King,
Abstract summary: RankTuner introduces a probability--entropy calibration signal, the Relative Rank Indicator, which compares the rank of the ground-truth token with its expected rank under the prediction distribution.<n>The inverse indicator is used as a token-wise Relative Scale to reweight the fine-tuning objective, focusing updates on truly under-learned tokens.
Score: 55.2818264614932
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Token-level reweighting is a simple yet effective mechanism for controlling supervised fine-tuning, but common indicators are largely one-dimensional: the ground-truth probability reflects downstream alignment, while token entropy reflects intrinsic uncertainty induced by the pre-training prior. Ignoring entropy can misidentify noisy or easily replaceable tokens as learning-critical, while ignoring probability fails to reflect target-specific alignment. RankTuner introduces a probability--entropy calibration signal, the Relative Rank Indicator, which compares the rank of the ground-truth token with its expected rank under the prediction distribution. The inverse indicator is used as a token-wise Relative Scale to reweight the fine-tuning objective, focusing updates on truly under-learned tokens without over-penalizing intrinsically uncertain positions. Experiments on multiple backbones show consistent improvements on mathematical reasoning benchmarks, transfer gains on out-of-distribution reasoning, and pre code generation performance over probability-only or entropy-only reweighting baselines.

Related papers

Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning [35.41241409574854]
We show that inference engines and numerically-precise training systems produce different probability distributions from the same parameters, creating a training-inference mismatch.<n>By pruning such tokens, we trade large, systematically biased mismatches for a small, bounded optimization bias.
arXiv Detail & Related papers (2025-12-28T21:44:07Z)
From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature [38.46122853450324]
Existing algorithms apply uniform optimization to all tokens, ignoring their different roles in reasoning process.<n>We introduce Heterogeneous Adaptive Policy Optimization (HAPO), a token-aware algorithm that dynamically adapts optimization based on token entropy.
arXiv Detail & Related papers (2025-09-20T09:30:25Z)
When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z)
Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling. deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective. This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z)
Alignment Entropy Regularization [13.904347165738491]
We use entropy to measure a model's uncertainty. We evaluate the effect of entropy regularization in encouraging the model to distribute the probability mass only on a smaller subset of allowed alignments.
arXiv Detail & Related papers (2022-12-22T18:51:02Z)
Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z)
Evaluating probabilistic classifiers: Reliability diagrams and score decompositions revisited [68.8204255655161]
We introduce the CORP approach, which generates provably statistically Consistent, Optimally binned, and Reproducible reliability diagrams in an automated way. Corpor is based on non-parametric isotonic regression and implemented via the Pool-adjacent-violators (PAV) algorithm.
arXiv Detail & Related papers (2020-08-07T08:22:26Z)
Optimal Change-Point Detection with Training Sequences in the Large and Moderate Deviations Regimes [72.68201611113673]
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective. We assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available.
arXiv Detail & Related papers (2020-03-13T23:39:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.