SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
- URL: http://arxiv.org/abs/2506.15021v1
- Date: Tue, 17 Jun 2025 23:12:28 GMT
- Title: SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models
- Authors: Gyuhak Kim, Sumiran Singh Thakur, Su Min Park, Wei Wei, Yujia Bao,
- Abstract summary: Supervised fine-tuning (SFT) has become an essential step in tailoring large language models (LLMs) to align with human expectations and specific downstream tasks.<n>Existing SFT methods treat each training instance as a uniform sequence, giving equal importance to all tokens regardless of their relevance.<n>We introduce Supervised Fine-Tuning with Group Optimization (SFT-GO), a novel approach that treats groups of tokens differently based on their importance.
- Score: 11.866570688304153
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Supervised fine-tuning (SFT) has become an essential step in tailoring large language models (LLMs) to align with human expectations and specific downstream tasks. However, existing SFT methods typically treat each training instance as a uniform sequence, giving equal importance to all tokens regardless of their relevance. This overlooks the fact that only a subset of tokens often contains critical, task-specific information. To address this limitation, we introduce Supervised Fine-Tuning with Group Optimization (SFT-GO), a novel approach that treats groups of tokens differently based on their importance.SFT-GO groups tokens in each sample based on their importance values and optimizes the LLM using a weighted combination of the worst-group loss and the standard cross-entropy loss. This mechanism adaptively emphasizes the most challenging token groups and guides the model to better handle different group distributions, thereby improving overall learning dynamics. We provide a theoretical analysis of SFT-GO's convergence rate, demonstrating its efficiency. Empirically, we apply SFT-GO with three different token grouping strategies and show that models trained with SFT-GO consistently outperform baseline approaches across popular LLM benchmarks. These improvements hold across various datasets and base models, demonstrating the robustness and the effectiveness of our method.
Related papers
- On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification [50.30835290642069]
We present a simple yet theoretically motivated improvement to Supervised Fine-Tuning (SFT) for the Large Language Model (LLM)<n>We reveal that standard SFT gradients implicitly encode a problematic reward structure that may severely restrict the generalization capabilities of model.<n>We propose Dynamic Fine-Tuning (DFT), stabilizing gradient updates for each token by dynamically rescaling the objective function with the probability of this token.
arXiv Detail & Related papers (2025-08-07T17:59:04Z) - The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs [66.17068546293487]
Large vision-language models (VLMs) increasingly adopt post-training techniques such as long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL) to elicit sophisticated reasoning.<n>We present a systematic investigation into the distinct roles and interplay of long-CoT SFT and RL across multiple multimodal reasoning benchmarks.<n>We find that SFT improves performance on difficult questions by in-depth, structured reasoning, but introduces verbosity and degrades performance on simpler ones.
arXiv Detail & Related papers (2025-07-10T09:05:49Z) - SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning [20.442971494407896]
Large language models (LLMs) have achieved remarkable progress in reasoning tasks, yet the optimal integration of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) remains a fundamental challenge.<n>We propose Supervised Reinforcement Fine-Tuning (SRFT), a single-stage method that unifies both fine-tuning paradigms through entropy-aware weighting mechanisms.<n>Extensive experiments show that SRFT achieves 59.1% average accuracy, outperforming zero-RL methods by 9.0% on five mathematical reasoning benchmarks and 10.9% on three out-of-distribution benchmarks.
arXiv Detail & Related papers (2025-06-24T16:31:37Z) - Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality [10.74213785908381]
Supervised fine-tuning (SFT) is a critical step in aligning large language models with human instructions and values.<n>We trained a wide range of base models on a variety of datasets including code generation, mathematical reasoning, and general-domain tasks.<n>We will release these 1,000+ SFT models and benchmark results to accelerate further research.
arXiv Detail & Related papers (2025-06-17T16:13:15Z) - Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z) - Taming LLMs by Scaling Learning Rates with Gradient Grouping [49.91587150497186]
Training large language models (LLMs) poses challenges due to their massive scale and heterogeneous architectures.<n>This work introduces Scaling with Gradient Grouping (SGG), an gradient wrapper that improves adaptive learning rate estimation by dynamic grouping and group-specific scaling.
arXiv Detail & Related papers (2025-06-01T15:30:37Z) - Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute [54.22256089592864]
This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute.<n>Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths.
arXiv Detail & Related papers (2025-04-01T13:13:43Z) - Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data [73.04828796123581]
Supervised fine-tuning (SFT) has become a crucial step for aligning pretrained large language models (LLMs)<n>We introduce Discriminative Fine-Tuning (DFT), an improved variant of SFT, which mitigates the burden of collecting human-labeled preference data.<n>Our contributions include: (i) a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input; (ii) efficient algorithms to optimize this discriminative likelihood; and (iii) extensive experiments demonstrating DFT's effectiveness
arXiv Detail & Related papers (2025-02-25T22:38:55Z) - Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models [12.500777267361102]
We introduce a novel textbfpreference-textbforiented supervised textbffine-textbftuning approach, namely PoFT.<n>The intuition is to boost SFT by imposing a particular preference: textitfavoring the target model over aligned LLMs on the same SFT data.<n>PoFT achieves stable and consistent improvements over the SFT baselines across different training datasets and base models.
arXiv Detail & Related papers (2024-12-17T12:49:14Z) - SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe [30.03925858123481]
Large language models (LLMs) undergo instruction tuning, where they are trained on instruction-response pairs using next-token prediction (NTP)<n>Efforts to improve instruction tuning often focus on higher-quality supervised fine-tuning (SFT) datasets.<n>We propose SFTMix, a novel Mixup-based recipe that LLM instruction tuning beyond the conventional NTP paradigm.
arXiv Detail & Related papers (2024-10-07T17:52:21Z) - Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens [57.37893387775829]
We introduce a fast and balanced clustering method, named textbfSemantic textbfEquitable textbfClustering (SEC)
SEC clusters tokens based on their global semantic relevance in an efficient, straightforward manner.
We propose a versatile vision backbone, SECViT, to serve as a vision language connector.
arXiv Detail & Related papers (2024-05-22T04:49:00Z) - Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process [26.196705232699884]
We introduce Intuitive Fine-Tuning (IFT) to integrate SFT and Preference Optimization into a single process.
IFT performs comparably or even superiorly to sequential recipes of SFT and some typical Preference Optimization methods.
An explainable Frozen Lake game further validates the effectiveness of IFT for getting competitive policy.
arXiv Detail & Related papers (2024-05-20T08:23:28Z) - Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model [50.339632513018934]
supervised fine-tuning (SFT) has been a straightforward approach for tailoring the output of foundation large language model (LLM) to specific preferences.
We critically examine this hypothesis within the scope of cross-lingual generation tasks.
We introduce a novel training-free alignment method named PreTTY, which employs minimal task-related prior tokens.
arXiv Detail & Related papers (2024-04-25T17:19:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.