Related papers: PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning

URL: http://arxiv.org/abs/2510.00192v2
Date: Sat, 01 Nov 2025 04:19:13 GMT
Title: PrunedLoRA: Robust Gradient-Based structured pruning for Low-rank Adaptation in Fine-tuning
Authors: Xin Yu, Cong Xie, Ziyu Zhao, Tiantian Fan, Lingzhou Xue, Zhi Zhang,
Abstract summary: Low-rank adaptation (LoRA) has become a widely used paradigm for parameter-efficient fine-tuning of large language models.<n>We propose textitPrunedLoRA, a new framework that leverages structured pruning to obtain highly representative low-rank adapters.
Score: 18.3077556191671
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-rank adaptation (LoRA) has become a widely used paradigm for parameter-efficient fine-tuning of large language models, yet its representational capacity often lags behind full fine-tuning. Within the context of LoRA, a key open question is how to obtain expressive low-rank adapters from over-parameterized spaces. We propose \textit{PrunedLoRA}, a new framework that leverages structured pruning to obtain highly representative low-rank adapters from an over-parameterized initialization. Unlike prior approaches that impose a fixed low-rank budget, PrunedLoRA dynamically prunes less important components during fine-tuning and prevents their reactivation, enabling flexible and adaptive rank allocation. For structured pruning, by minimizing the pruning error for overall loss, we provide fine-grained pruning and recovery updates in a gradient-based pruning strategy with grounded interpretation. We provide the first theoretical analysis of the robustness of structured pruning and provably show that under the impact of weight perturbation, gradient-based pruning is more robust than activation-based pruning with respect to overall loss. Empirically, PrunedLoRA consistently outperforms LoRA and its variants across supervised fine-tuning tasks in mathematical reasoning, code generation, and natural language understanding, and it also demonstrates advantages over existing structured pruning methods across diverse sparsity levels.

Related papers

On the Limits of Layer Pruning for Generative Reasoning in LLMs [0.5437050212139086]
Layer pruning can compress large language models (LLMs) while retaining strong performance on classification benchmarks with little or no finetuning.<n>We find that tasks requiring multi-step reasoning are particularly sensitive to depth reduction.<n>Under realistic post-training constraints, we evaluate a simple mitigation strategy based on supervised finetuning.
arXiv Detail & Related papers (2026-02-02T11:57:22Z)
Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA [50.97792275353563]
We introduce a novel framework that restructures a single Low-Rank Adaptation (LoRA) module as a decomposable Rank-1 Expert Pool.<n>Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [Guided] token.
arXiv Detail & Related papers (2026-01-30T10:54:51Z)
From Local to Global: Revisiting Structured Pruning Paradigms for Large Language Models [27.774067682004745]
GISP-Global Iterative Structured Pruning removes attention heads and channels using first-order, loss-based important weights aggregated at the structure level with block-wise normalization.<n>An iterative schedule, rather than one-shot pruning, stabilizes accuracy at higher sparsity and mitigates perplexity collapse without requiring intermediate fine-tuning.<n>Because importance is defined by a model-level loss, GISP naturally supports task-specific objectives; we instantiate perplexity for language modeling and a margin-based objective for decision-style tasks.
arXiv Detail & Related papers (2025-10-20T19:04:09Z)
NIRVANA: Structured pruning reimagined for large language models compression [50.651730342011014]
We introduce NIRVANA, a novel pruning method designed to balance immediate zero-shot preservation accuracy with robust fine-tuning.<n>To further address the unique challenges posed by structured pruning, NIRVANA incorporates an adaptive sparsity allocation mechanism across layers and modules.<n>Experiments conducted on Llama3, Qwen, T5 models demonstrate that NIRVANA outperforms existing structured pruning methods under equivalent sparsity constraints.
arXiv Detail & Related papers (2025-09-17T17:59:00Z)
Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes [9.4848188271008]
Low-Rank Adaptation (LoRA) has proven effective in reducing computational costs while maintaining performance comparable to fully fine-tuned foundation models.<n>Current adaptive LoRA methods attempt to overcome this limitation by dynamically expanding or selectively allocating ranks.<n>We introduce Stable Rank-Guided Low-Rank Adaptation (SR-LoRA), a novel framework that utilizes the stable rank of pre-trained weight matrices as a natural prior for layer-wise rank allocation.
arXiv Detail & Related papers (2025-06-30T23:54:23Z)
LoRA-MGPO: Mitigating Double Descent in Low-Rank Adaptation via Momentum-Guided Perturbation Optimization [16.360816770124874]
We introduce LoRA-MGPO, a framework that incorporates Momentum-Guided Perurbation Optimization (MGPO)<n>MGPO stabilizes training dynamics and guiding momentum vectors from the gradient's state.<n>Experiments show that LoRA-MGPO consistently achieves superior performance over LoRA and other PEFT methods.
arXiv Detail & Related papers (2025-02-20T13:14:41Z)
GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning [2.7446241148152253]
Fine-tuning large language models (LLMs) is computationally intensive because it requires updating all parameters.<n>Low-Rank Adaptation (LoRA) improves efficiency by modifying only a subset of weights but introduces a trade-off between expressivity and computational cost.<n>We propose Geometric Low-Rank Adaptation (GeLoRA), a novel framework that computes the intrinsic dimensionality of hidden state representations to adaptively select LoRA ranks.
arXiv Detail & Related papers (2024-12-12T13:04:54Z)
Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape [52.98187034726091]
We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space.<n>We show that Flat-LoRA improves both in-domain and out-of-domain generalization.
arXiv Detail & Related papers (2024-09-22T11:24:10Z)
Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities. Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z)
Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z)
Layer-adaptive sparsity for the Magnitude-based Pruning [88.37510230946478]
We propose a novel importance score for global pruning, coined layer-adaptive magnitude-based pruning (LAMP) score. LAMP consistently outperforms popular existing schemes for layerwise sparsity selection.
arXiv Detail & Related papers (2020-10-15T09:14:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.