Related papers: MASA: Rethinking the Representational Bottleneck in LoRA with Multi-A Shared Adaptation

MASA: Rethinking the Representational Bottleneck in LoRA with Multi-A Shared Adaptation

URL: http://arxiv.org/abs/2510.06005v1
Date: Tue, 07 Oct 2025 15:06:46 GMT
Title: MASA: Rethinking the Representational Bottleneck in LoRA with Multi-A Shared Adaptation
Authors: Qin Dong, Yuntian Tang, Heming Jia, Yunhang Shen, Bohan Jia, Wenxuan Huang, Lianyue Zhang, Jiao Xie, Shaohui Lin,
Abstract summary: Low-Rank Adaptation (LoRA) has emerged as a dominant method in.<n>Low-Rank Adaptation (LoRA) has emerged as a dominant method in.<n>Low-Rank Adaptation (LoRA) has emerged as a dominant method in.<n>Low-Rank Adaptation (LoRA) has emerged as a dominant method in.<n>Low-Rank Adaptation (LoRA) has emerged as a dominant method in.<n>Low-Rank Adaptation (LoRA) has emerged as a dominant method in.<n>Low-Rank Adaptation (LoRA) has emerged as a dominant method in.<n>
Score: 28.079735905482096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-Rank Adaptation (LoRA) has emerged as a dominant method in Parameter-Efficient Fine-Tuning (PEFT) for large language models, which augments the transformer layer with one down-projection $A$ and one up-projection $B$. However, LoRA's reliance on a single down-projection matrix ($A$) creates a representational bottleneck, as this solitary feature extractor is inherently insufficient for capturing the diverse signals required by complex tasks. This motivates our architectural shift to focus on enriching the feature adaptation to improve the downstream task adaptation ability. We propose MASA (Multi-$A$ Shared Adaptation), an architecture that implements a multi-$A$, single-$B$ structure where the multi-$A$ expert ensemble is asymmetrically shared across layers to ensure parameter efficiency. In MASA, these specialized experts capture diverse features, which are then integrated by a single, layer-specific $B$-matrix. The effectiveness and versatility of our method are validated through a comprehensive suite of experiments spanning multi-domain generalization, single-domain specialization, and multi-task reasoning. For example, on the MMLU benchmark, MASA achieves an average accuracy of 59.62%, outperforming the standard LoRA by 1.08 points (a relative improvement of 1.84%) with comparable learnable parameters of 0.52%.

Related papers

Less is More: Resource-Efficient Low-Rank Adaptation [15.883867662707743]
EffiLoRA is a lightweight and generalizable approach for language, multimodal, and diffusion models.<n>It consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation.
arXiv Detail & Related papers (2025-11-30T12:52:04Z)
AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models [11.663809872664105]
Low-Rank Adaptation (LoRA) has emerged as one of the most widely adopted approaches.<n>LoRA is typically applied to the $WQ$ and $WV$ projection matrices of self-attention modules.<n>We introduce textAILoRA, a novel parameter-efficient method that incorporates function-aware asymmetric low-rank priors.
arXiv Detail & Related papers (2025-10-09T10:13:16Z)
Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z)
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs [26.212332132619736]
We propose an asymmetric multi-LoRA design with multiple $A$ matrices and a single shared $B$ in multi-task fine-tuning.<n>Our methods achieve more balanced performance across tasks with comparable or superior average accuracy relative to existing multi-LoRA approaches.
arXiv Detail & Related papers (2025-09-29T19:16:14Z)
Align, Don't Divide: Revisiting the LoRA Architecture in Multi-Task Learning [20.31474646915225]
We show that a simplified multi-head architecture with high inter-head similarity outperforms complex multi-adapter and multi-head systems.<n>We propose Align-LoRA, which incorporates an explicit loss to align task representations within the shared adapter space.
arXiv Detail & Related papers (2025-08-07T07:02:55Z)
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [68.44043212834204]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z)
AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs [5.018961516699825]
AsymLoRA is a parameter-efficient tuning framework that unifies knowledge modularization and cross-modal coordination.<n>AsymLoRA consistently surpasses both vanilla LoRA, which captures only commonalities, and LoRA-MoE, which focuses solely on conflicts.
arXiv Detail & Related papers (2025-02-27T12:21:02Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning [102.18178065928426]
Efficient Visual Instruction Fine-Tuning (EVIT) seeks to adapt Multimodal Large Language Models (MLLMs) to downstream tasks with minimal computational overhead.<n>We propose the Dual Low-Rank Adaptation (Dual-LoRA), a holistic-to-local framework that enhances the adapter's capacity to address data conflict.
arXiv Detail & Related papers (2024-11-19T11:03:09Z)
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning [74.43869839954168]
We propose MTL-LoRA, which retains the advantages of low-rank adaptation while significantly enhancing MTL capabilities.<n> MTL-LoRA augments LoRA by incorporating additional task-adaptive parameters that differentiate task-specific information and capture shared knowledge.<n>This approach enables pre-trained models to jointly adapt to different target domains with a limited number of trainable parameters.
arXiv Detail & Related papers (2024-10-12T08:32:26Z)
Multimodal Instruction Tuning with Conditional Mixture of LoRA [51.58020580970644]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA)<n>It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance.<n> Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z)
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning [20.750808913757396]
LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. We propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA.
arXiv Detail & Related papers (2023-11-20T02:59:18Z)
Multi-task Highly Adaptive Lasso [1.4680035572775534]
We propose a novel, fully nonparametric approach for the multi-task learning, the Multi-task Highly Adaptive Lasso (MT-HAL) MT-HAL simultaneously learns features, samples and task associations important for the common model, while imposing a shared sparse structure among similar tasks. We show that MT-HAL outperforms sparsity-based MTL competitors across a wide range of simulation studies.
arXiv Detail & Related papers (2023-01-27T23:46:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.