Related papers: ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

URL: http://arxiv.org/abs/2601.09195v1
Date: Wed, 14 Jan 2026 05:50:40 GMT
Title: ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
Authors: Tao Liu, Taiqiang Wu, Runming Yang, Shaoning Sun, Junjie Wang, Yujiu Yang,
Abstract summary: Supervised fine-tuning is a strategy to align Large Language Models with human intent.<n>Traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer.<n>We propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting.
Score: 47.413985185291864
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks.

Related papers

Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version) [49.462399222747024]
We propose a novel framework for the logical specification of non-Markovian rewards in Decision Processes (MDPs) with large state spaces.<n>Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLfMT)<n>We introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity.
arXiv Detail & Related papers (2026-02-05T22:11:28Z)
CORE: Context-Robust Remasking for Diffusion Language Models [51.59514489363897]
We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision.<n>Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations.<n>On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
arXiv Detail & Related papers (2026-02-04T00:12:30Z)
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge [87.51901436392427]
Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT)<n>Humans, by contrast, often reason softly by maintaining a tractable probability distribution over plausible next steps.<n>We propose Multiplex Thinking, a soft reasoning mechanism that samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token.<n>Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT.
arXiv Detail & Related papers (2026-01-13T18:48:00Z)
Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning [18.934789236342244]
Large language models (LLMs) primarily rely on supervised fine-tuning (SFT) to adapt pre-trained models to domain-specific tasks such as mathematical reasoning.<n>Standard SFT uniformly penalizes all tokens, neglecting that only a small subset of critical tokens determines reasoning correctness.<n>We propose Critical Token Fine-tuning (CFT), a simple yet effective approach that updates only tokens identified as functionally indispensable via counterfactual perturbations.
arXiv Detail & Related papers (2025-10-13T03:25:36Z)
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification [61.607788999847564]
We present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM)<n>We reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generalization capabilities of model.<n>We propose Dynamic Fine-Tuning (DFT), stabilizing gradient updates for each token by dynamically rescaling the objective function with the probability of this token.
arXiv Detail & Related papers (2025-08-07T17:59:04Z)
Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models [0.0]
We show that errors are not uniformly distributed but are concentrated at sparse "key tokens" representing critical decision junctions.<n>We propose a framework for next-generation systems centered on selective preservation of semantically vital tokens.
arXiv Detail & Related papers (2025-05-30T03:57:31Z)
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack [44.205352310633174]
Large Language Models (LLMs) are increasingly evaluated on multiple-choice question answering (MCQA) tasks.<n>We propose a solution: the *prefilling attack*, a structured natural-language prefix (e.g., "*The correct option is:*") prepended to the model output.<n>Our findings suggest that prefilling is a simple, robust, and low-cost method to enhance the reliability of FTP-based evaluation in multiple-choice settings.
arXiv Detail & Related papers (2025-05-21T09:58:38Z)
Language Model Uncertainty Quantification with Attention Chain [9.093726246465117]
Large language models' (LLM) predictive uncertainty is crucial for judging the reliability of its answers.<n>We propose UQAC, an efficient method that narrows the reasoning space to a tractable size for marginalization.<n>We validate UQAC on multiple reasoning benchmarks with advanced open-source LLMs.
arXiv Detail & Related papers (2025-03-24T21:43:47Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process [19.986235452236272]
Supervised Fine-Tuning (SFT) and Preference Optimization (PO) are key processes for aligning Language Models (LMs) with human preferences post pre-training.<n>We introduce Intuitive Fine-Tuning (IFT) to integrate SFT and PO into a single process.<n>IFT performs comparably or even superiorly to SFT and some typical PO methods across several tasks.
arXiv Detail & Related papers (2024-05-20T08:23:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.