How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness
- URL: http://arxiv.org/abs/2512.15634v1
- Date: Wed, 17 Dec 2025 17:44:09 GMT
- Title: How Much is Too Much? Exploring LoRA Rank Trade-offs for Retaining Knowledge and Domain Robustness
- Authors: Darshita Rathore, Vineet Kumar, Chetna Bansal, Anindya Moitra,
- Abstract summary: Large language models are increasingly adapted to downstream tasks through fine-tuning.<n>Full supervised fine-tuning (SFT) and parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), are two dominant approaches.
- Score: 0.8757823830026766
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models are increasingly adapted to downstream tasks through fine-tuning. Full supervised fine-tuning (SFT) and parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), are two dominant approaches. While PEFT methods are widely used for their computational efficiency, the implications of their configurations (e.g., rank) remain under-explored in downstream Q&A tasks and generalisation. In this work, we perform a comprehensive evaluation across multiple reasoning and recall datasets, conducting a rank sweep to quantify the trade-off between SFT and PEFT. We also compare the accuracy of PEFT and SFT models across in-domain and out-of-domain adaptation, highlighting distinct generalisation behaviour and task-specific forgetting. We demonstrate that LoRA achieves competitive and in some cases superior performance compared to SFT, particularly on reasoning tasks at specific rank values. Additionally, we analyze the internal representations via spectral features and layer-wise attention structures, offering insights into representational drift and structural changes in attention patterns.
Related papers
- Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking [56.46309219272326]
For large language models (LLMs), classification via supervised fine-tuning (SFT) predicts ''yes'' (resp. ''no'') token for relevant (resp. irrelevant) pairs.<n>This divergence raises a central question: which objective is intrinsically better suited to LLM-based reranking, and what mechanism underlies the difference?<n>We conduct a comprehensive comparison and analysis between CL and SFT for reranking, taking the universal multimodal retrieval (UMR) as the experimental playground.
arXiv Detail & Related papers (2025-10-16T16:02:27Z) - SFT Doesn't Always Hurt General Capabilities: Revisiting Domain-Specific Fine-Tuning in LLMs [53.77646961962239]
Supervised Fine-Tuning (SFT) is a common approach to adapt Large Language Models (LLMs) to specialized tasks.<n>We show that SFT does not always hurt: using a smaller learning rate can substantially mitigate general performance degradation.
arXiv Detail & Related papers (2025-09-25T05:28:22Z) - DiffLoRA: Differential Low-Rank Adapters for Large Language Models [59.58987161199141]
We introduce DiffLoRA, a parameter-efficient adaptation of the differential attention mechanism, with low-rank adapters on both positive and negative attention terms.<n>We evaluate DiffLoRA across a broad range of NLP tasks, including general benchmarks, many-shot in-context learning, RAG, and long-context tests.
arXiv Detail & Related papers (2025-07-31T14:24:59Z) - Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training [36.69514399442043]
This paper presents a comparative analysis of two core post-training paradigms: supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT)<n>Our experiments are conducted on a benchmark comprising seven diverse multimodal tasks.
arXiv Detail & Related papers (2025-07-07T18:17:06Z) - Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality [22.105116314320696]
Supervised fine-tuning (SFT) is a critical step in aligning large language models with human instructions and values.<n>We trained a wide range of base models on a variety of datasets including code generation, mathematical reasoning, and general-domain tasks.<n>We then identified the dataset properties that matter most and examined the layer-wise modifications introduced by SFT.
arXiv Detail & Related papers (2025-06-17T16:13:15Z) - Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z) - FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts [47.35092228595656]
FLoE is a novel PEFT framework that introduces two key innovations: (i) a Fisher information-guided importance scoring mechanism to dynamically identify task-critical transformer layers for MoE-based low-rank adaptation, enabling sparse adapter deployment; and (ii) a Bayesian optimization-driven rank allocator that automatically determines optimal LoRA ranks on specific datasets without exhaustive grid search.<n>Experiments across diverse LLMs and benchmarks reveal that FLoE achieves impressive efficiency-accuracy trade-offs, making FLoE particularly advantageous in resource-constrained environments that necessitate rapid adaptation.
arXiv Detail & Related papers (2025-05-31T10:27:08Z) - MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning [5.412348391086257]
We propose MSPLoRA, which introduces Global Shared LoRA, Mid-Level Shared LoRA, and Layer-Specific LoRA to capture global patterns, mid-level features, and fine-grained information.<n> Experiments on various NLP tasks demonstrate that MSPLoRA achieves more efficient adaptation and better performance while significantly reducing the number of trainable parameters.
arXiv Detail & Related papers (2025-03-27T07:01:50Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - Task-Specific Directions: Definition, Exploration, and Utilization in Parameter Efficient Fine-Tuning [65.31677646659895]
Large language models demonstrate impressive performance on downstream tasks, yet they require extensive resource consumption when fully fine-tuning all parameters.<n>We propose a framework to clearly define task-specific directions (TSDs) and explore their properties and practical utilization challenges.<n>We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of TSDs during the fine-tuning process.
arXiv Detail & Related papers (2024-09-02T08:10:51Z) - Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge [15.553942864736989]
Language Models (LMs) memorize a vast amount of factual knowledge, exhibiting strong performance across diverse tasks and domains.<n>The two prominent approaches to enhance the performance of LMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data.<n>This paper explores and evaluates the impact of RAG and FT on customizing LMs in handling low-frequency entities on question answering tasks.
arXiv Detail & Related papers (2024-03-03T08:07:55Z) - DoRA: Weight-Decomposed Low-Rank Adaptation [57.68678247436207]
We introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA.
Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed Low-Rank Adaptation (DoRA)
DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning.
arXiv Detail & Related papers (2024-02-14T17:59:34Z) - Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An
Empirical Study [10.052053069122652]
PEFT has demonstrated superior performance and lower computational overhead in several code understanding tasks.
It harnesses the pre-trained general-purpose knowledge for downstream tasks.
It remains unclear whether PEFT outperforms FMFT in task-specific adaptation for code-change-related tasks.
arXiv Detail & Related papers (2024-02-09T08:40:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.