FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked
RDMA Transmission
- URL: http://arxiv.org/abs/2403.00881v1
- Date: Fri, 1 Mar 2024 09:14:10 GMT
- Title: FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked
RDMA Transmission
- Authors: Zeling Zhang, Dongqi Cai, Yiran Zhang, Mengwei Xu, Shangguang Wang, Ao
Zhou
- Abstract summary: FedRDMA is a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol.
We show that sys can achieve up to 3.8$times$ speedup in communication efficiency compared to traditional TCP/IP-based FL systems.
- Score: 5.199151525305899
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Communication overhead is a significant bottleneck in federated learning
(FL), which has been exaggerated with the increasing size of AI models. In this
paper, we propose FedRDMA, a communication-efficient cross-silo FL system that
integrates RDMA into the FL communication protocol. To overcome the limitations
of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into
chunks and designs a series of optimization techniques to improve the
efficiency and robustness of RDMA-based communication. We implement FedRDMA
atop the industrial federated learning framework and evaluate it on a
real-world cross-silo FL scenario. The experimental results show that \sys can
achieve up to 3.8$\times$ speedup in communication efficiency compared to
traditional TCP/IP-based FL systems.
Related papers
- Communication-Efficient Federated Learning by Quantized Variance Reduction for Heterogeneous Wireless Edge Networks [55.467288506826755]
Federated learning (FL) has been recognized as a viable solution for local-privacy-aware collaborative model training in wireless edge networks.
Most existing communication-efficient FL algorithms fail to reduce the significant inter-device variance.
We propose a novel communication-efficient FL algorithm, named FedQVR, which relies on a sophisticated variance-reduced scheme.
arXiv Detail & Related papers (2025-01-20T04:26:21Z) - Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse [56.384390765357004]
We propose an integrated federated split learning and hyperdimensional computing framework for emerging foundation models.
This novel approach reduces communication costs, computation load, and privacy risks, making it suitable for resource-constrained edge devices in the Metaverse.
arXiv Detail & Related papers (2024-08-26T17:03:14Z) - WDMoE: Wireless Distributed Large Language Models with Mixture of Experts [65.57581050707738]
We propose a wireless distributed Large Language Models (LLMs) paradigm based on Mixture of Experts (MoE)
We decompose the MoE layer in LLMs by deploying the gating network and the preceding neural network layer at base station (BS) and mobile devices.
We design an expert selection policy by taking into account both the performance of the model and the end-to-end latency.
arXiv Detail & Related papers (2024-05-06T02:55:50Z) - Robust and Communication-Efficient Federated Domain Adaptation via
Random Features [9.97347047837426]
federated domain adaptation (FDA) emerges as a powerful approach to address this challenge.
RF-TCA is an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance.
We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.
arXiv Detail & Related papers (2023-11-08T13:46:58Z) - Multiagent Reinforcement Learning with an Attention Mechanism for
Improving Energy Efficiency in LoRa Networks [52.96907334080273]
As the network scale increases, the energy efficiency of LoRa networks decreases sharply due to severe packet collisions.
We propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa)
Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms.
arXiv Detail & Related papers (2023-09-16T11:37:23Z) - FLCC: Efficient Distributed Federated Learning on IoMT over CSMA/CA [0.0]
Federated Learning (FL) has emerged as a promising approach for privacy preservation.
This article investigates the performance of FL on an application that might be used to improve a remote healthcare system over ad hoc networks.
We present two metrics to evaluate the network performance: 1) probability of successful transmission while minimizing the interference, and 2) performance of distributed FL model in terms of accuracy and loss.
arXiv Detail & Related papers (2023-03-29T16:36:42Z) - CFLIT: Coexisting Federated Learning and Information Transfer [18.30671838758503]
We study the coexistence of over-the-air FL and traditional information transfer (IT) in a mobile edge network.
We propose a coexisting federated learning and information transfer (CFLIT) communication framework, where the FL and IT devices share the wireless spectrum in an OFDM system.
arXiv Detail & Related papers (2022-07-26T13:17:28Z) - SlimFL: Federated Learning with Superposition Coding over Slimmable
Neural Networks [56.68149211499535]
Federated learning (FL) is a key enabler for efficient communication and computing leveraging devices' distributed computing capabilities.
This paper proposes a novel learning framework by integrating FL and width-adjustable slimmable neural networks (SNNs)
We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2022-03-26T15:06:13Z) - Joint Superposition Coding and Training for Federated Learning over
Multi-Width Neural Networks [52.93232352968347]
This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN)
FL preserves data privacy by exchanging the locally trained models of mobile devices. SNNs are however non-trivial, particularly under wireless connections with time-varying channel conditions.
We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2021-12-05T11:17:17Z) - Federated Learning over Wireless IoT Networks with Optimized
Communication and Resources [98.18365881575805]
Federated learning (FL) as a paradigm of collaborative learning techniques has obtained increasing research attention.
It is of interest to investigate fast responding and accurate FL schemes over wireless systems.
We show that the proposed communication-efficient federated learning framework converges at a strong linear rate.
arXiv Detail & Related papers (2021-10-22T13:25:57Z) - EdgeML: Towards Network-Accelerated Federated Learning over Wireless
Edge [11.49608766562657]
Federated learning (FL) is a distributed machine learning technology for next-generation AI systems.
This paper aims to accelerate FL convergence over wireless edge by optimizing the multi-hop federated networking performance.
arXiv Detail & Related papers (2021-10-14T14:06:57Z) - CDMA: A Practical Cross-Device Federated Learning Algorithm for General
Minimax Problems [21.595391808043484]
Minimax problems arise in a wide range of important applications including robust adversarial learning and Generative Adversarial Network (GAN) training.
We develop the first practical algorithm named CDMA for general minimax problems in the cross-device FL setting.
CDMA is based on a Start-Immediately-With-Enough-Responses mechanism, in which the server first signals a subset of clients to perform local computation and then starts to aggregate the local results reported by clients once it receives responses from enough clients in each round.
arXiv Detail & Related papers (2021-05-29T05:18:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.