Related papers: Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

URL: http://arxiv.org/abs/2502.08130v2
Date: Thu, 20 Feb 2025 06:10:33 GMT
Title: Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Authors: Sonam Gupta, Yatin Nandwani, Asaf Yehudai, Dinesh Khandelwal, Dinesh Raghu, Sachindra Joshi,
Abstract summary: This paper introduces Selective Self-to-Supervised Fine-Tuning (S3FT)<n>S3FT achieves better performance than the standard supervised fine-tuning (SFT) while improving generalization.<n>The effectiveness of S3FT is demonstrated through experiments on mathematical reasoning, Python programming and reading comprehension tasks.
Score: 24.659722730219134
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning Large Language Models (LLMs) on specific datasets is a common practice to improve performance on target tasks. However, this performance gain often leads to overfitting, where the model becomes too specialized in either the task or the characteristics of the training data, resulting in a loss of generalization. This paper introduces Selective Self-to-Supervised Fine-Tuning (S3FT), a fine-tuning approach that achieves better performance than the standard supervised fine-tuning (SFT) while improving generalization. S3FT leverages the existence of multiple valid responses to a query. By utilizing the model's correct responses, S3FT reduces model specialization during the fine-tuning stage. S3FT first identifies the correct model responses from the training set by deploying an appropriate judge. Then, it fine-tunes the model using the correct model responses and the gold response (or its paraphrase) for the remaining samples. The effectiveness of S3FT is demonstrated through experiments on mathematical reasoning, Python programming and reading comprehension tasks. The results show that standard SFT can lead to an average performance drop of up to $4.4$ on multiple benchmarks, such as MMLU and TruthfulQA. In contrast, S3FT reduces this drop by half, i.e. $2.5$, indicating better generalization capabilities than SFT while performing significantly better on the fine-tuning tasks.

Related papers

Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models. We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models. Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference. Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z)
The Best Instruction-Tuning Data are Those That Fit [17.401088816596054]
Supervised fine-tuning (SFT) data are crucial for eliciting strong capabilities from pretrained large language models (LLMs)<n>We propose **GRAPE**, a novel SFT framework that accounts for the unique characteristics of the target model.<n>For each instruction, it gathers responses from various LLMs and selects the one with the highest probability measured by the target model.
arXiv Detail & Related papers (2025-02-06T16:31:21Z)
DELIFT: Data Efficient Language model Instruction Fine Tuning [13.538140114667772]
We introduce DELIFT, a novel algorithm that systematically optimize data selection across the three key stages of fine-tuning. Experiments across various tasks and model scales demonstrate that DELIFT can reduce the fine-tuning data size by up to 70% without compromising performance.
arXiv Detail & Related papers (2024-11-07T04:38:29Z)
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models [19.752712857873043]
This paper introduces Selective Self-Rehearsal (SSR), a fine-tuning approach that achieves performance comparable to the standard supervised fine-tuning (SFT) By utilizing the model's correct responses, SSR reduces model specialization during the fine-tuning stage. The effectiveness of SSR is demonstrated through experiments on the task of identifying unanswerable queries across various datasets.
arXiv Detail & Related papers (2024-09-07T10:21:03Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
AutoFT: Learning an Objective for Robust Fine-Tuning [60.641186718253735]
Foundation models encode rich representations that can be adapted to downstream tasks by fine-tuning. Current approaches to robust fine-tuning use hand-crafted regularization techniques. We propose AutoFT, a data-driven approach for robust fine-tuning.
arXiv Detail & Related papers (2024-01-18T18:58:49Z)
Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models [4.096453902709292]
BitFit and adapter modules are compared to standard full model fine-tuning. The BitFit approach matches full fine-tuning performance across varying amounts of training data. adapter modules exhibit high variability, with inconsistent gains over default models.
arXiv Detail & Related papers (2024-01-08T17:44:43Z)
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models [75.29595679428105]
We investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that rejection samples from multiple models push LLaMA-7B to an accuracy of 49.3% on GSM8K which outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.
arXiv Detail & Related papers (2023-08-03T15:34:01Z)
Two-stage LLM Fine-tuning with Less Specialization and More Generalization [93.12197594813378]
We propose Prompt Tuning with MOdel Tuning (ProMoT) to reduce format specialization and improve generalization. ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt. ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task.
arXiv Detail & Related papers (2022-11-01T17:56:57Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.