Related papers: Logits-Based Finetuning

Logits-Based Finetuning

URL: http://arxiv.org/abs/2505.24461v2
Date: Wed, 11 Jun 2025 16:40:39 GMT
Title: Logits-Based Finetuning
Authors: Jingyao Li, Senqiao Yang, Sitong Wu, Han Shi, Chuanyang Zheng, Hong Xu, Jiaya Jia,
Abstract summary: We propose a logits-based fine-tuning framework that integrates the strengths of supervised learning and knowledge distillation.<n>Our approach constructs enriched training targets by combining teacher logits with ground truth labels, preserving both correctness and linguistic diversity.
Score: 48.18151583153572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, developing compact and efficient large language models (LLMs) has emerged as a thriving area of research. Traditional Supervised Fine-Tuning (SFT), which relies on singular ground truth labels, often fails to capture token-level dependencies and linguistic diversity. To address these limitations, we propose a logits-based fine-tuning framework that integrates the strengths of supervised learning and knowledge distillation. Our approach constructs enriched training targets by combining teacher logits with ground truth labels, preserving both correctness and linguistic diversity. This ensures more reliable and effective training. We constructed a large-scale 1.2M logits dataset and trained a series of science-focused models. Experimental results demonstrate that our method achieves significant improvements, with accuracy gains of 18% on Mawps and 22.7% on TabMWP. Across nine widely used mathematical benchmarks, our method consistently outperforms prior SFT models, achieving an average improvement of 7.28%. Codes are available at https://github.com/dvlab-research/Logits-Based-Finetuning.

Related papers

A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning [0.40964539027092906]
Supervised Fine-Tuning and Reinforcement Learning are the dominant training paradigms.<n>This paper introduces a practical and effective training recipe that strategically integrates extended SFT with RL from online inference.<n>Our experiments reveal that extending SFT for as many as 10 epochs is crucial for performance breakthroughs.<n>This work provides the community with a battle-tested blueprint for developing state-of-the-art mathematical reasoners.
arXiv Detail & Related papers (2025-07-11T02:26:01Z)
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding [61.15402517835137]
We build a supervised fine-tuning (SFT) dataset to achieve state-of-the-art coding capability results in models of various sizes.<n>Our models use only SFT to achieve 61.8% on LiveCodeBench and 24.6% on CodeContests, surpassing alternatives trained with reinforcement learning.
arXiv Detail & Related papers (2025-04-02T17:50:31Z)
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation [29.606646251624923]
Fine-tuning is still far away from satisfactory trustworthiness due to "tuning-induced mis-calibration" We propose Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less miscalibration (-10%) are achieved.
arXiv Detail & Related papers (2024-08-22T07:31:00Z)
Iterative Deployment Exposure for Unsupervised Out-of-Distribution Detection [5.019613806273252]
Iterative Deployment Exposure (IDE) is a novel and more realistic setting for out-of-distribution (OOD) detection.<n> CSO uses a new U-OOD scoring function that combines the Mahalanobis distance with a nearest-neighbor approach.<n>We validate our approach on a dedicated benchmark, showing that our method greatly improves upon strong baselines on three medical imaging modalities.
arXiv Detail & Related papers (2024-06-04T13:57:34Z)
WeiPer: OOD Detection using Weight Perturbations of Class Projections [11.130659240045544]
We introduce perturbations of the class projections in the final fully connected layer which creates a richer representation of the input. We achieve state-of-the-art OOD detection results across multiple benchmarks of the OpenOOD framework.
arXiv Detail & Related papers (2024-05-27T13:38:28Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data. Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z)
Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement [41.650761556671775]
In this paper, we offer insights and analyses of recent state-of-the-art out-of-distribution (OOD) detection methods. We demonstrate that activation pruning has a detrimental effect on OOD detection, while activation scaling enhances it. We achieve AUROC scores of +1.85% for near-OOD and +0.74% for far-OOD datasets on the OpenOOD v1.5 ImageNet-1K benchmark.
arXiv Detail & Related papers (2023-09-30T02:10:54Z)
OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection [82.85303878718207]
Out-of-Distribution (OOD) detection is critical for the reliable operation of open-world intelligent systems.<n>This paper presents OpenOOD v1.5, a significant improvement from its predecessor that ensures accurate and standardized evaluation of OOD detection methodologies.
arXiv Detail & Related papers (2023-06-15T17:28:00Z)
Accurate Knowledge Distillation with n-best Reranking [2.9526110883017433]
We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) We leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. Our results demonstrate that utilizing pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model.
arXiv Detail & Related papers (2023-05-20T01:53:03Z)
Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective [55.45202687256175]
Out-of-distribution (OOD) detection methods assume that they have test ground truths, i.e., whether individual test samples are in-distribution (IND) or OOD. In this paper, we are the first to introduce the unsupervised evaluation problem in OOD detection. We propose three methods to compute Gscore as an unsupervised indicator of OOD detection performance.
arXiv Detail & Related papers (2023-02-16T13:34:35Z)
Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need [52.88953913542445]
We find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. We take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD)
arXiv Detail & Related papers (2023-02-06T08:24:41Z)
Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance. We develop a framework that permits data-dependent weighted cross-entropy losses. Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z)
To be Critical: Self-Calibrated Weakly Supervised Learning for Salient Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations. We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions. We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z)
Towards Few-Shot Fact-Checking via Perplexity [40.11397284006867]
We propose a new way of utilizing the powerful transfer learning ability of a language model via a perplexity score. Our methodology can already outperform the Major Class baseline by more than absolute 10% on the F1-Macro metric. We construct and publicly release two new fact-checking datasets related to COVID-19.
arXiv Detail & Related papers (2021-03-17T09:43:19Z)
Robust Out-of-distribution Detection for Neural Networks [51.19164318924997]
We show that existing detection mechanisms can be extremely brittle when evaluating on in-distribution and OOD inputs. We propose an effective algorithm called ALOE, which performs robust training by exposing the model to both adversarially crafted inlier and outlier examples.
arXiv Detail & Related papers (2020-03-21T17:46:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.