HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture
- URL: http://arxiv.org/abs/2502.19747v2
- Date: Tue, 04 Mar 2025 02:42:56 GMT
- Title: HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture
- Authors: Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Chufan Shi, Zhengwu Liu, Ngai Wong,
- Abstract summary: Low-rank adaptation (LoRA) is a parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks.<n>To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method.<n> Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score.
- Score: 9.451914483640605
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks. In this paper, we first propose to deploy the LoRA-finetuned LLMs on the hybrid compute-in-memory (CIM) architecture (i.e., pretrained weights onto RRAM and LoRA onto SRAM). To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method, aiming to train a LoRA branch that is both robust and accurate by aligning the training objectives under both ideal and noisy conditions. Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score while maintaining robustness at various noise levels.
Related papers
- Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation [85.89510825889168]
We introduce LoRA-Pre, a novel low-rank system for efficient pre-training.<n>LoRA-Pre decomposing the momentum matrix into a compact low-rank subspace within the online linear learner.<n>We empirically validate LoRA-Pre's efficacy by pre-training models from the Llama architecture family.
arXiv Detail & Related papers (2026-02-27T18:57:06Z) - Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models [14.755143405057929]
Fine-tuning large vision models (LVMs) and large language models (LLMs) under differentially private learning (DPFL) is hindered by a fundamental privacy-utility trade-off.<n>Low-Rank Adaptation (LoRA), a promising parameter-efficient fine-tuning (PEFT) method, reduces computational and communication costs by introducing two trainable low-rank matrices while freezing pre-trained weights.<n>We propose LA-LoRA, a novel approach that decouples gradient interactions and aligns update directions across clients to enhance robustness under stringent privacy constraints.
arXiv Detail & Related papers (2026-02-23T15:05:28Z) - Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update [50.36542772932594]
Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights.<n>There is still a gap between full training with low-rank projections (SVDLoRA) and LoRA fine-tuning, indicating that LoRA steps can be further improved.
arXiv Detail & Related papers (2025-09-24T10:32:50Z) - Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models [33.28146211296799]
Sharpness-Aware Minimization (SAM) has proven effective in improving generalization by seeking flat minima.<n>We propose Bi-directional Low-Rank Adaptation (Bi-LoRA), which introduces an auxiliary LoRA module to model SAM's adversarial weight perturbations.<n>Bi-LoRA captures broader sharpness for achieving flatter minima while remaining memory-efficient.
arXiv Detail & Related papers (2025-08-27T04:46:56Z) - Dynamic Low-Rank Sparse Adaptation for Large Language Models [54.1231638555233]
Low-rank Sparse Adaptation (LoSA) is a novel method that seamlessly integrates low-rank adaptation into sparse LLM sparsity.<n>LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning.<n>LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden.
arXiv Detail & Related papers (2025-02-20T18:37:32Z) - A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models [22.457766373989365]
Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation.<n>To address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters.<n>We propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning procedure by multi-space projections.
arXiv Detail & Related papers (2025-02-20T05:58:53Z) - BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.<n>We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z) - Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models [23.442612142677504]
Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution for large language models.<n>However, the memory footprint of LoRA is largely dominated by the original model parameters.<n>We propose LoRAM, a memory-efficient LoRA training scheme.
arXiv Detail & Related papers (2025-02-19T08:39:15Z) - LoRA-Mini : Adaptation Matrices Decomposition and Selective Training [2.0670689746336]
Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter-efficient fine-tuning by reducing the number of trainable parameters.
We propose LoRA-Mini, an optimized adaptation of LoRA that improves parameter efficiency by splitting low-rank matrices into four parts.
This approach achieves upto a 20x reduction compared to standard LoRA in the number of trainable parameters while preserving performance levels comparable to standard LoRA.
arXiv Detail & Related papers (2024-11-24T12:21:14Z) - LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements.
This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning [9.91790333647256]
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective parameter-efficient fine-tuning (PEFT) methods.
We propose Mixture of Low-Rank Adaptation (MiLoRA), a novel and efficient LoRA variant.
MiLoRA differs from previous MOE-style LoRA methods by considering each LoRA module as an expert and employing a prompt-aware routing mechanism.
arXiv Detail & Related papers (2024-10-23T17:04:40Z) - MoR: Mixture of Ranks for Low-Rank Adaptation Tuning [18.102354643796826]
Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning.
MoE-style LoRA methods substantially increase parameters and inference latency.
We introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input.
MoR achieves impressive results, with MoR delivering a 1.31% performance improvement while using only 93.93% of the parameters compared to baseline methods.
arXiv Detail & Related papers (2024-10-17T10:14:52Z) - Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix.
A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance.
We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z) - LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models.
Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning.
We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.