Learning from the Undesirable: Robust Adaptation of Language Models without Forgetting
- URL: http://arxiv.org/abs/2511.13052v1
- Date: Mon, 17 Nov 2025 06:57:44 GMT
- Title: Learning from the Undesirable: Robust Adaptation of Language Models without Forgetting
- Authors: Yunhun Nam, Jaehyung Kim, Jongheon Jeong,
- Abstract summary: Language models (LMs) are often adapted through supervised fine-tuning (SFT) to specialize their capabilities for downstream tasks.<n>In typical scenarios where the fine-tuning data is limited, SFT can lead LMs to overfit, causing them to rely on spurious patterns.<n>We propose Learning-from-the-Undesirable (LfU), a simple yet effective regularization scheme for SFT to mitigate issues when fine-tuning LMs with limited data.
- Score: 18.680059467974825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) are often adapted through supervised fine-tuning (SFT) to specialize their capabilities for downstream tasks. However, in typical scenarios where the fine-tuning data is limited, e.g., compared to pre-training, SFT can lead LMs to overfit, causing them to rely on spurious patterns within the target task or to compromise other broadly useful capabilities as a side effect of narrow specialization. In this paper, we propose Learning-from-the-Undesirable (LfU), a simple yet effective regularization scheme for SFT to mitigate overfitting issues when fine-tuning LMs with limited data. Specifically, we aim to regularize the fine-tuning process to favor solutions that are resilient to "undesirable" model updates, e.g., gradient ascent steps that steer the model toward undesirable behaviors. To this end, we propose a novel form of consistency regularization that directly aligns internal representations of the model with those after an undesirable update. By leveraging representation-level data augmentation through undesirable updates, LfU effectively promotes generalization under limited data. Our experiments on diverse LM downstream tasks show that LfU serves as an effective prior that enhances adaptability while preserving pretrained knowledge. For example, our LM from LfU achieves a 16.8% average improvement on math tasks compared to vanilla SFT on the same dataset, where the latter even leads to degraded performance on those tasks. Furthermore, LfU exhibits improved robustness to prompt variations, e.g., yielding a 92.1% lower standard deviation in output performances compared to SFT, highlighting its versatile effects.
Related papers
- Learn More, Forget Less: A Gradient-Aware Data Selection Approach for LLM [51.21051698747157]
We propose a self-adaptive gradient-aware data selection approach (GrADS) for supervised fine-tuning of large language models (LLMs)<n>Specifically, we design self-guided criteria that leverage the magnitude and statistical distribution of gradients to prioritize examples that contribute the most to the model's learning process.<n>Through extensive experimentation with various LLMs across diverse domains such as medicine, law, and finance, GrADS has demonstrated significant efficiency and cost-effectiveness.
arXiv Detail & Related papers (2025-11-07T08:34:50Z) - Data Efficient Adaptation in Large Language Models via Continuous Low-Rank Fine-Tuning [34.343514432589586]
This paper proposes textbf, a novel framework that integrates Low-Rank Adaptation (LoRA) with a continuous fine-tuning strategy.<n> Experiments on 15 diverse datasets show that DEAL consistently outperforms baseline methods.<n>These findings demonstrate the potential of our approach to advance continual adaptation in Large Language Models.
arXiv Detail & Related papers (2025-09-23T12:55:57Z) - Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z) - Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting [1.5595148909011116]
Supervised Fine-Tuning (SFT) is a critical step for enhancing the instruction-following capabilities of Large Language Models (LLMs)<n>SFT often leads to a degradation of the model's general abilities, a phenomenon known as catastrophic forgetting.<n>We propose a novel and cost-effective SFT method that effectively mitigates catastrophic forgetting without requiring access to the original SFT data.
arXiv Detail & Related papers (2025-06-11T06:23:50Z) - SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models [7.44035983292392]
We propose a self-learning framework for large language models (LLMs) inspired by human learning pattern.<n>This framework takes a fine-tuning (SFT) dataset in a specific domain as input.<n>We show that our method substantially reduces training time while achieving comparable improvements to those attained with full dataset fine-tuning.
arXiv Detail & Related papers (2025-05-23T04:50:54Z) - Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data [73.04828796123581]
Supervised fine-tuning (SFT) has become a crucial step for aligning pretrained large language models (LLMs)<n>We introduce Discriminative Fine-Tuning (DFT), an improved variant of SFT, which mitigates the burden of collecting human-labeled preference data.<n>Our contributions include: (i) a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input; (ii) efficient algorithms to optimize this discriminative likelihood; and (iii) extensive experiments demonstrating DFT's effectiveness
arXiv Detail & Related papers (2025-02-25T22:38:55Z) - Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models [12.500777267361102]
We introduce a novel textbfpreference-textbforiented supervised textbffine-textbftuning approach, namely PoFT.<n>The intuition is to boost SFT by imposing a particular preference: textitfavoring the target model over aligned LLMs on the same SFT data.<n>PoFT achieves stable and consistent improvements over the SFT baselines across different training datasets and base models.
arXiv Detail & Related papers (2024-12-17T12:49:14Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning
Language Models [14.087415157225715]
Fine-tuning Large Language Models (LLMs) adapts a trained model to specific downstream tasks.
Supervised Fine-Tuning (SFT) is a common approach, where an LLM is trained to produce desired answers.
This paper introduces an alternative to SFT called Natural Language Feedback for Finetuning LLMs (LaFFi)
arXiv Detail & Related papers (2023-12-31T21:18:16Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.