Kronecker-LoRA: hybrid Kronecker-LoRA adapters for scalable, sustainable fine-tuning
- URL: http://arxiv.org/abs/2508.01961v1
- Date: Mon, 04 Aug 2025 00:02:15 GMT
- Title: Kronecker-LoRA: hybrid Kronecker-LoRA adapters for scalable, sustainable fine-tuning
- Authors: Yixin Shen,
- Abstract summary: We introduce textbfKron-LoRA, a two-stage adapter that first factorizes each frozen linear update as a Kronecker product.<n>Kron-LoRA retains the expressivity of the update while using up to $4!times!$ fewer parameters than a standard rank-8 LoRA adapter.
- Score: 0.5629386140722666
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Fine-tuning massive pre-trained language models across many tasks demands adapters that are both parameter-efficient and highly expressive. We introduce \textbf{Kron-LoRA}, a two-stage adapter that first factorizes each frozen linear update as a Kronecker product \[ \Delta W = A \otimes B \] and then compresses \[ B \in \mathbb{R}^{d_{B2}\times d_{B1}} \] via an \(r\)-rank LoRA decomposition \(B \approx B_{1}B_{2}\). By leveraging \[ \mathrm{rank}(A \otimes B) \;=\; \mathrm{rank}(A)\,\mathrm{rank}(B), \] Kron-LoRA retains the expressivity of the update while using up to $4\!\times\!$ fewer parameters than a standard rank-8 LoRA adapter. Its compact adapter matrices also quantize to 8- or 4-bit with less accuracy degradation than LoRA, enabling further memory and storage savings for on-device deployment. We benchmark on DistilBERT and Mistral-7B across five tasks (PIQA, HellaSwag, WinoGrande, ARC-Easy, ARC-Challenge) over multiple epochs of adapter-only tuning: on DistilBERT, an 840 K-parameter Kron-LoRA matches LoRA-16's performance, and on Mistral-7B, a 5.7 M-parameter Kron-LoRA rivals LoRA-8 with modest memory savings and only a 3-8\% speed overhead. In sequential fine-tuning from ARC-Challenge to ARC-Easy, Kron-LoRA retains 55.18\% accuracy versus 53.17\% for LoRA-8-despite using only one-quarter of the adapter parameters-underscoring its competitive cross-task transfer performance. By uniting Kronecker structure, low-rank compression, quantization-friendliness, and by providing transparent trade-off analysis, Kron-LoRA offers a scalable, sustainable, and continual-learning-ready solution for multi-task adaptation of large language models.
Related papers
- Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights [75.83625828306839]
textbfDrag-and-Drop LLMs (textitDnD) eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates.<n>A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices.
arXiv Detail & Related papers (2025-06-19T15:38:21Z) - Dynamic Low-Rank Sparse Adaptation for Large Language Models [54.1231638555233]
Low-rank Sparse Adaptation (LoSA) is a novel method that seamlessly integrates low-rank adaptation into sparse LLM sparsity.<n>LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning.<n>LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden.
arXiv Detail & Related papers (2025-02-20T18:37:32Z) - NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models [12.431575579432458]
We introduce StructuredLoRA (SLoRA), which investigates adding a small intermediate matrix between the low-rank matrices A and B.<n> Secondly, we propose Nystr"omLoRA (NLoRA), which leverages Nystr"om-based initialization for SLoRA to improve its effectiveness and efficiency.<n>Finally, we propose IntermediateTune (IntTune), which explores fine-tuning exclusively on the intermediate matrix of NLoRA to further boost LLM efficiency.
arXiv Detail & Related papers (2025-02-20T12:01:11Z) - RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation [59.34193580856381]
Low-Rank Adaptation (LoRA) is widely used and effective for fine-tuning large language models.<n>We propose RoRA (Rank-adaptive Reliability Optimization), a simple yet effective method for optimizing LoRA's scaling factor.<n>RoRA ensures improved performance as rank size increases and excels in the more challenging task of accuracy recovery when fine-tuning pruned models.
arXiv Detail & Related papers (2025-01-08T07:13:52Z) - LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements.<n>This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z) - LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models.<n>Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning.<n>We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z) - LoRA-GA: Low-Rank Adaptation with Gradient Approximation [5.685201910521295]
Fine-tuning large-scale pretrained models is prohibitively expensive in terms of computational and memory costs.
LoRA offers a cost-effective alternative by fine-tuning an auxiliary low-rank model that has significantly fewer parameters.
LoRA converges at a considerably slower rate compared to full fine-tuning, leading to increased overall compute and often worse test performance.
arXiv Detail & Related papers (2024-07-06T08:37:21Z) - LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters [11.23006032094776]
We introduce LoRA-XS, a novel low-rank adaptation method that considerably reduces the trainable parameters while showing superior or competitive performance.
LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA.
arXiv Detail & Related papers (2024-05-27T19:07:13Z) - S-LoRA: Serving Thousands of Concurrent LoRA Adapters [59.490751234925206]
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks.
We present S-LoRA, a system designed for the scalable serving of many LoRA adapters.
arXiv Detail & Related papers (2023-11-06T17:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.