Related papers: Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter

Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter

URL: http://arxiv.org/abs/2305.12449v1
Date: Sun, 21 May 2023 12:48:38 GMT
Title: Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
Authors: Yi Liu, Xiaohan Bi, Lei Li, Sishuo Chen, Wenkai Yang, Xu Sun
Abstract summary: Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources. This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training. However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck. We propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients.
Score: 21.512817959760007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources. This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training. This significantly reduces the cost of corpus collection and preserves data privacy. However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck. In this paper, we propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients. Since different language pairs exhibit substantial discrepancies in data distributions, adapter parameters of clients may conflict with each other. To tackle this, we explore various clustering strategies to group parameters for integration and mitigate the negative effects of conflicting parameters. Experimental results demonstrate that our framework reduces communication cost by over 98% while achieving similar or even better performance compared to competitive baselines. Further analysis reveals that clustering strategies effectively solve the problem of linguistic discrepancy and pruning adapter modules further improves communication efficiency.

Related papers

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency [6.0860246234554545]
Federated Learning (FL) enables collaborative learning across distributed clients while preserving data privacy. We propose a novel framework designed to tackle these challenges by introducing a dual-adapter approach.
arXiv Detail & Related papers (2025-03-10T17:21:33Z)
Modality Alignment Meets Federated Broadcasting [9.752555511824593]
Federated learning (FL) has emerged as a powerful approach to safeguard data privacy by training models across distributed edge devices without centralizing local data. This paper introduces a novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices.
arXiv Detail & Related papers (2024-11-24T13:30:03Z)
FedsLLM: Federated Split Learning for Large Language Models over Communication Networks [30.47242577997792]
This paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The proposed algorithm reduces delays by an average of 47.63% compared to unoptimized scenarios.
arXiv Detail & Related papers (2024-07-12T13:23:54Z)
SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z)
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation [10.541541376305245]
Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices. FL is hindered by excessive communication costs due to repeated server-client communication during training. We propose FedCompress, a novel approach that combines dynamic weight clustering and server-side knowledge distillation.
arXiv Detail & Related papers (2024-01-25T14:49:15Z)
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation [19.28500206536013]
Federated learning (FL) is a promising approach for solving multilingual tasks. We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions. We demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.
arXiv Detail & Related papers (2024-01-15T04:04:26Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
Communication and Storage Efficient Federated Split Learning [19.369076939064904]
Federated Split Learning preserves the parallel model training principle of FL. Server has to maintain separate models for every client, resulting in a significant computation and storage requirement. This paper proposes a communication and storage efficient federated and split learning strategy.
arXiv Detail & Related papers (2023-02-11T04:44:29Z)
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry. We propose FedDM to build the global training objective from multiple local surrogate functions. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z)
DisPFL: Towards Communication-Efficient Personalized Federated Learning via Decentralized Sparse Training [84.81043932706375]
We propose a novel personalized federated learning framework in a decentralized (peer-to-peer) communication protocol named Dis-PFL. Dis-PFL employs personalized sparse masks to customize sparse local models on the edge. We demonstrate that our method can easily adapt to heterogeneous local clients with varying computation complexities.
arXiv Detail & Related papers (2022-06-01T02:20:57Z)
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry. Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders. We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z)
Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization. We show how to practically optimize this objective for large translation corpora using an iterated best response scheme. Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
Dynamic Attention-based Communication-Efficient Federated Learning [85.18941440826309]
Federated learning (FL) offers a solution to train a global machine learning model. FL suffers performance degradation when client data distribution is non-IID. We propose a new adaptive training algorithm $textttAdaFL$ to combat this degradation.
arXiv Detail & Related papers (2021-08-12T14:18:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.