Communication Efficient Federated Learning for Multilingual Neural
Machine Translation with Adapter
- URL: http://arxiv.org/abs/2305.12449v1
- Date: Sun, 21 May 2023 12:48:38 GMT
- Title: Communication Efficient Federated Learning for Multilingual Neural
Machine Translation with Adapter
- Authors: Yi Liu, Xiaohan Bi, Lei Li, Sishuo Chen, Wenkai Yang, Xu Sun
- Abstract summary: Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources.
This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training.
However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck.
We propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients.
- Score: 21.512817959760007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a
promising paradigm for institutions with limited language resources. This
approach allows multiple institutions to act as clients and train a unified
model through model synchronization, rather than collecting sensitive data for
centralized training. This significantly reduces the cost of corpus collection
and preserves data privacy. However, as pre-trained language models (PLMs)
continue to increase in size, the communication cost for transmitting
parameters during synchronization has become a training speed bottleneck. In
this paper, we propose a communication-efficient Fed-MNMT framework that
addresses this issue by keeping PLMs frozen and only transferring lightweight
adapter modules between clients. Since different language pairs exhibit
substantial discrepancies in data distributions, adapter parameters of clients
may conflict with each other. To tackle this, we explore various clustering
strategies to group parameters for integration and mitigate the negative
effects of conflicting parameters. Experimental results demonstrate that our
framework reduces communication cost by over 98% while achieving similar or
even better performance compared to competitive baselines. Further analysis
reveals that clustering strategies effectively solve the problem of linguistic
discrepancy and pruning adapter modules further improves communication
efficiency.
Related papers
- Modality Alignment Meets Federated Broadcasting [9.752555511824593]
Federated learning (FL) has emerged as a powerful approach to safeguard data privacy by training models across distributed edge devices without centralizing local data.
This paper introduces a novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices.
arXiv Detail & Related papers (2024-11-24T13:30:03Z) - FedsLLM: Federated Split Learning for Large Language Models over Communication Networks [30.47242577997792]
This paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework.
The proposed algorithm reduces delays by an average of 47.63% compared to unoptimized scenarios.
arXiv Detail & Related papers (2024-07-12T13:23:54Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Communication-Efficient Federated Learning through Adaptive Weight
Clustering and Server-Side Distillation [10.541541376305245]
Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices.
FL is hindered by excessive communication costs due to repeated server-client communication during training.
We propose FedCompress, a novel approach that combines dynamic weight clustering and server-side knowledge distillation.
arXiv Detail & Related papers (2024-01-25T14:49:15Z) - Only Send What You Need: Learning to Communicate Efficiently in
Federated Multilingual Machine Translation [19.28500206536013]
Federated learning (FL) is a promising approach for solving multilingual tasks.
We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions.
We demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.
arXiv Detail & Related papers (2024-01-15T04:04:26Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Communication and Storage Efficient Federated Split Learning [19.369076939064904]
Federated Split Learning preserves the parallel model training principle of FL.
Server has to maintain separate models for every client, resulting in a significant computation and storage requirement.
This paper proposes a communication and storage efficient federated and split learning strategy.
arXiv Detail & Related papers (2023-02-11T04:44:29Z) - DisPFL: Towards Communication-Efficient Personalized Federated Learning
via Decentralized Sparse Training [84.81043932706375]
We propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named Dis-PFL.
Dis-PFL employs personalized sparse masks to customize sparse local models on the edge.
We demonstrate that our method can easily adapt to heterogeneous local clients with varying computation complexities.
arXiv Detail & Related papers (2022-06-01T02:20:57Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Dynamic Attention-based Communication-Efficient Federated Learning [85.18941440826309]
Federated learning (FL) offers a solution to train a global machine learning model.
FL suffers performance degradation when client data distribution is non-IID.
We propose a new adaptive training algorithm $textttAdaFL$ to combat this degradation.
arXiv Detail & Related papers (2021-08-12T14:18:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.