HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture
- URL: http://arxiv.org/abs/2502.19747v2
- Date: Tue, 04 Mar 2025 02:42:56 GMT
- Title: HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture
- Authors: Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Chufan Shi, Zhengwu Liu, Ngai Wong,
- Abstract summary: Low-rank adaptation (LoRA) is a parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks.<n>To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method.<n> Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score.
- Score: 9.451914483640605
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks. In this paper, we first propose to deploy the LoRA-finetuned LLMs on the hybrid compute-in-memory (CIM) architecture (i.e., pretrained weights onto RRAM and LoRA onto SRAM). To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method, aiming to train a LoRA branch that is both robust and accurate by aligning the training objectives under both ideal and noisy conditions. Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score while maintaining robustness at various noise levels.
Related papers
- Dynamic Low-Rank Sparse Adaptation for Large Language Models [54.1231638555233]
Low-rank Sparse Adaptation (LoSA) is a novel method that seamlessly integrates low-rank adaptation into sparse LLM sparsity.<n>LoSA dynamically sparsifies the LoRA outcomes based on the corresponding sparse weights during fine-tuning.<n>LoSA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden.
arXiv Detail & Related papers (2025-02-20T18:37:32Z) - A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models [22.457766373989365]
Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation.<n>To address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters.<n>We propose a new training strategy for MoE-LoRA, to stabilize and boost its feature learning procedure by multi-space projections.
arXiv Detail & Related papers (2025-02-20T05:58:53Z) - BeamLoRA: Beam-Constraint Low-Rank Adaptation [51.52097743781401]
Low-Rank Adaptation (LoRA) has been widely adopted as one of the most effective parameter-efficient fine-tuning methods.<n>We propose BeamLoRA, which conceptualizes each LoRA module as a beam where each rank naturally corresponds to a potential sub-solution.
arXiv Detail & Related papers (2025-02-19T10:33:22Z) - Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models [23.442612142677504]
Low-Rank Adaption (LoRA) offers a cost-effective fine-tuning solution for large language models.<n>However, the memory footprint of LoRA is largely dominated by the original model parameters.<n>We propose LoRAM, a memory-efficient LoRA training scheme.
arXiv Detail & Related papers (2025-02-19T08:39:15Z) - LoRA-Mini : Adaptation Matrices Decomposition and Selective Training [2.0670689746336]
Low-Rank Adaptation (LoRA) has emerged as a promising solution, enabling parameter-efficient fine-tuning by reducing the number of trainable parameters.
We propose LoRA-Mini, an optimized adaptation of LoRA that improves parameter efficiency by splitting low-rank matrices into four parts.
This approach achieves upto a 20x reduction compared to standard LoRA in the number of trainable parameters while preserving performance levels comparable to standard LoRA.
arXiv Detail & Related papers (2024-11-24T12:21:14Z) - LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements.
This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z) - Less is More: Extreme Gradient Boost Rank-1 Adaption for Efficient Finetuning of LLMs [75.11449420928139]
Fine-tuning Large Language Models (LLMs) has become a crucial technique for adapting pre-trained models to downstream tasks.
Low-Rank Adaptation (LoRA) has emerged as a promising solution, but there exists a gap between the practical performance of low-rank adaptations and its theoretical optimum.
We propose eXtreme Gradient Boosting LoRA, a novel framework that bridges this gap by leveraging the power of ensemble learning.
arXiv Detail & Related papers (2024-10-25T17:07:13Z) - MiLoRA: Efficient Mixture of Low-Rank Adaptation for Large Language Models Fine-tuning [9.91790333647256]
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective parameter-efficient fine-tuning (PEFT) methods.
We propose Mixture of Low-Rank Adaptation (MiLoRA), a novel and efficient LoRA variant.
MiLoRA differs from previous MOE-style LoRA methods by considering each LoRA module as an expert and employing a prompt-aware routing mechanism.
arXiv Detail & Related papers (2024-10-23T17:04:40Z) - MoR: Mixture of Ranks for Low-Rank Adaptation Tuning [18.102354643796826]
Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning.
MoE-style LoRA methods substantially increase parameters and inference latency.
We introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input.
MoR achieves impressive results, with MoR delivering a 1.31% performance improvement while using only 93.93% of the parameters compared to baseline methods.
arXiv Detail & Related papers (2024-10-17T10:14:52Z) - Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape [52.98187034726091]
Low-Rank Adaptation (LoRA) is an efficient way to fine-tune models by optimizing only a low-rank matrix.
A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance.
We propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.
arXiv Detail & Related papers (2024-09-22T11:24:10Z) - LoRA-Pro: Are Low-Rank Adapters Properly Optimized? [121.0693322732454]
Low-rank adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning of foundation models.
Despite its computational efficiency, LoRA still yields inferior performance compared to full fine-tuning.
We introduce LoRA-Pro, a method that enhances LoRA's performance by strategically adjusting the gradients of low-rank matrices.
arXiv Detail & Related papers (2024-07-25T17:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.