Towards Federated Low-Rank Adaptation with Rank-Heterogeneous Communication
- URL: http://arxiv.org/abs/2406.17477v1
- Date: Tue, 25 Jun 2024 11:49:33 GMT
- Title: Towards Federated Low-Rank Adaptation with Rank-Heterogeneous Communication
- Authors: Yuji Byun, Jaeho Lee,
- Abstract summary: We find that the empirical performance of low-rank adaptation (LoRA) is highly unstable with respect to rank-heterogeneity.
The root cause of this instability is the zero-padding-based aggregation strategy adopted in conventional federated LoRA frameworks.
We propose a new replication-based padding strategy, which allows us to better leverage the information from clients with high-quality datasets.
- Score: 12.515874333424929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-rank adaptation (LoRA) is an attractive alternative of adapting full weights for the federated fine-tuning of large pretrained models, which can significantly reduce the memory and communication burden. In principle, federated LoRA can provide an effective mean to allocate different resources to each client by tuning ranks for each client, which can be useful in achieving a better communication-performance tradeoff. We find, however, that the empirical performance of LoRA is highly unstable with respect to such rank-heterogeneity, severely limiting the applicability to the scenarios where it is desirable or even required to allocate nonuniform communication bandwidth to each client due to constrained total bandwidth. Our investigation reveals that the root cause of this instability is the zero-padding-based aggregation strategy adopted in conventional federated LoRA frameworks, which causes the information from high rank clients to get diluted during the aggregation process. To address this issue, we propose a new replication-based padding strategy, which allows us to better leverage the information from clients with high-quality datasets. This method ensures that valuable information from high rank clients is retained during the aggregation process, accelerating the convergence speed and enhancing the overall prediction quality of the global model.
Related papers
- Federated Dynamical Low-Rank Training with Global Loss Convergence Guarantees [1.9183348587701112]
A global low-rank basis of network weights enables client training on a small coefficient matrix.
A consistent global low-rank basis allows us to incorporate a variance correction scheme and prove global loss descent and convergence.
We show a reduction of client compute and communication costs by up to an order of magnitude with minimal impacts on global accuracy.
arXiv Detail & Related papers (2024-06-25T18:51:08Z) - An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets.
Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round.
We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z) - Federated Natural Policy Gradient Methods for Multi-task Reinforcement
Learning [49.65958529941962]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Federated Learning for Semantic Parsing: Task Formulation, Evaluation
Setup, New Algorithms [29.636944156801327]
Multiple clients collaboratively train one global model without sharing their semantic parsing data.
Lorar adjusts each client's contribution to the global model update based on its training loss reduction during each round.
Clients with smaller datasets enjoy larger performance gains.
arXiv Detail & Related papers (2023-05-26T19:25:49Z) - Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape [59.841889495864386]
In federated learning (FL), a cluster of local clients are chaired under the coordination of a global server.
Clients are prone to overfit into their own optima, which extremely deviates from the global objective.
ttfamily FedSMOO adopts a dynamic regularizer to guarantee the local optima towards the global objective.
Our theoretical analysis indicates that ttfamily FedSMOO achieves fast $mathcalO (1/T)$ convergence rate with low bound generalization.
arXiv Detail & Related papers (2023-05-19T10:47:44Z) - Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated
Learning [14.196701066823499]
In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes.
We show that individual client models experience a catastrophic forgetting with respect to data from other clients.
We propose an efficient approach that modifies the cross-entropy objective on a per-client basis by re-weighting the softmax logits prior to computing the loss.
arXiv Detail & Related papers (2023-04-11T14:51:55Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - FedADMM: A Federated Primal-Dual Algorithm Allowing Partial
Participation [3.7677951749356686]
In particular, it follows a client-server broadcast model and is particularly appealing to its ability to accommodate in client compute and storage.
Our contribution is to offer a new federated learning algorithm, FedADMM, for solving non-smooth composite problems.
arXiv Detail & Related papers (2022-03-28T21:20:43Z) - Communication-Efficient Federated Learning with Accelerated Client Gradient [46.81082897703729]
Federated learning often suffers from slow and unstable convergence due to the heterogeneous characteristics of participating client datasets.
We propose a simple but effective federated learning framework, which improves the consistency across clients and facilitates the convergence of the server model.
We provide the theoretical convergence rate of our algorithm and demonstrate remarkable performance gains in terms of accuracy and communication efficiency.
arXiv Detail & Related papers (2022-01-10T05:31:07Z) - Faster Non-Convex Federated Learning via Global and Local Momentum [57.52663209739171]
textttFedGLOMO is the first (first-order) FLtexttFedGLOMO algorithm.
Our algorithm is provably optimal even with communication between the clients and the server.
arXiv Detail & Related papers (2020-12-07T21:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.