DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models
- URL: http://arxiv.org/abs/2505.21382v1
- Date: Tue, 27 May 2025 16:10:53 GMT
- Title: DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models
- Authors: Nastaran Saadati, Zhanhong Jiang, Joshua R. Waite, Shreyan Ganguly, Aditya Balu, Chinmay Hegde, Soumik Sarkar,
- Abstract summary: Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs)<n>This work improves the convergence rate of decentralized LoRA to match the rate of decentralized gradient SGD by ensuring smoothness.<n>We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference.
- Score: 22.45637113673959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs). LoRA accomplishes this by freezing the pre-trained model weights and injecting trainable low-rank matrices, allowing for efficient learning of these foundation models even on edge devices. However, LoRA in decentralized settings still remains under explored, particularly for the theoretical underpinnings due to the lack of smoothness guarantee and model consensus interference (defined formally below). This work improves the convergence rate of decentralized LoRA (DLoRA) to match the rate of decentralized SGD by ensuring gradient smoothness. We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference. Theoretical analysis shows TSVD's approximation error is bounded and consensus differences between DLoRA and DeCAF vanish as rank increases, yielding DeCAF's matching convergence rate. Extensive experiments across vision/language tasks demonstrate our algorithms outperform local training and rivals federated learning under both IID and non-IID data distributions.
Related papers
- Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models [14.755143405057929]
Fine-tuning large vision models (LVMs) and large language models (LLMs) under differentially private learning (DPFL) is hindered by a fundamental privacy-utility trade-off.<n>Low-Rank Adaptation (LoRA), a promising parameter-efficient fine-tuning (PEFT) method, reduces computational and communication costs by introducing two trainable low-rank matrices while freezing pre-trained weights.<n>We propose LA-LoRA, a novel approach that decouples gradient interactions and aligns update directions across clients to enhance robustness under stringent privacy constraints.
arXiv Detail & Related papers (2026-02-23T15:05:28Z) - ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z) - Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRA [20.00589625873043]
textttTAD-LoRA is a serverless variant of federated learning.<n>We show that textttTAD-LoRA is competitive in strongly connected topologies and delivers clear gains under moderately and weakly connected topologies.
arXiv Detail & Related papers (2026-01-31T01:57:53Z) - Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA [50.97792275353563]
We introduce a novel framework that restructures a single Low-Rank Adaptation (LoRA) module as a decomposable Rank-1 Expert Pool.<n>Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [Guided] token.
arXiv Detail & Related papers (2026-01-30T10:54:51Z) - Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration [56.074760766965085]
PRISM achieves a dynamics-aware framework that arbitrates data based on its degree of cognitive conflict with the model's existing knowledge.<n>Our findings suggest that disentangling data based on internal optimization regimes is crucial for scalable and robust agent alignment.
arXiv Detail & Related papers (2026-01-12T05:43:20Z) - ADF-LoRA: Alternating Low-Rank Aggregation for Decentralized Federated Fine-Tuning [20.00589625873043]
We introduce ADF-LoRA, which synchronizes the update of only one low-rank matrix per round and mixes both matrices to maintain more consistent parameter states under decentralized propagation.<n> Experiments show that ADF-LoRA achieves faster and smoother convergence and delivers the highest average accuracy across tasks, outperforming existing LoRA variants in decentralized FL by a consistent margin.
arXiv Detail & Related papers (2025-11-23T05:09:32Z) - Convergence Analysis of Aggregation-Broadcast in LoRA-enabled Federated Learning [4.947778455281166]
Federated Learning (FL) enables collaborative model training across decentralized data sources.<n>Low-Rank Adaptation (LoRA) has been introduced into FL as an efficient fine-tuning method.<n>How to aggregate LoRA-updated local models on the server remains a critical and understudied problem.
arXiv Detail & Related papers (2025-08-02T12:54:17Z) - FedHL: Federated Learning for Heterogeneous Low-Rank Adaptation via Unbiased Aggregation [6.5370850242187855]
Federated Learning (FL) facilitates the fine-tuning of Foundation Models (FMs) using distributed data sources.<n>Low-Rank Adaptation (LoRA) gaining popularity due to its low communication costs and strong performance.<n>Existing methods lack formal convergence guarantees due to parameter truncation and biased gradient updates.
arXiv Detail & Related papers (2025-05-24T04:12:12Z) - Decentralized Low-Rank Fine-Tuning of Large Language Models [14.75695352321115]
We propose Dec-LoRA, a decentralized fine-tuning algorithm for Large Language Models (LLMs) based Low-Rank Adaptation (LoRA)<n>Through experiments on BERT and LLaMA, we demonstrate that Dec-LoRA achieves comparable performance to centralized LoRA under various conditions.<n>These findings highlight the potential of Dec-LoRA for scalable fine-tuning in decentralized environments.
arXiv Detail & Related papers (2025-01-26T01:56:25Z) - SD-LoRA: Scalable Decoupled Low-Rank Adaptation for Class Incremental Learning [73.93639228235622]
Continual Learning with foundation models has emerged as a promising paradigm to exploit abundant knowledge acquired during pre-training for tackling sequential tasks.<n>Existing prompt-based and Low-Rank Adaptation-based (LoRA-based) methods often require expanding a prompt/LoRA pool or retaining samples of previous tasks.<n>We propose Scalable Decoupled LoRA (SD-LoRA) for class incremental learning, which continually separates the learning of the magnitude and direction of LoRA components without rehearsal.
arXiv Detail & Related papers (2025-01-22T20:00:41Z) - Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [58.288682735160585]
Low-Rank Adaptation (LoRA) is a popular technique for finetuning models.
LoRA often under performs when compared to full- parameter fine-tuning.
We present a framework that rigorously analyzes the adaptation rates of LoRA methods.
arXiv Detail & Related papers (2024-10-10T18:51:53Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient ensembling method for self-attention networks.<n>The method not only outperforms state-of-the-art implicit techniques like BatchEnsemble, but even matches or exceeds the accuracy of an Explicit Ensemble.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - Improving LoRA in Privacy-preserving Federated Learning [44.47315926976059]
Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models.
This paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges.
arXiv Detail & Related papers (2024-03-18T23:20:08Z) - Sparse Low-rank Adaptation of Pre-trained Language Models [79.74094517030035]
We introduce sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.
Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters.
Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
arXiv Detail & Related papers (2023-11-20T11:56:25Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - FedAgg: Adaptive Federated Learning with Aggregated Gradients [1.5653612447564105]
We propose an adaptive FEDerated learning algorithm called FedAgg to alleviate the divergence between the local and average model parameters and obtain a fast model convergence rate.
We show that our framework is superior to existing state-of-the-art FL strategies for enhancing model performance and accelerating convergence rate under IID and Non-IID datasets.
arXiv Detail & Related papers (2023-03-28T08:07:28Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.