Communication-Efficient Federated Learning for Neural Machine
Translation
- URL: http://arxiv.org/abs/2112.06135v1
- Date: Sun, 12 Dec 2021 03:16:03 GMT
- Title: Communication-Efficient Federated Learning for Neural Machine
Translation
- Authors: Tanya Roosta, Peyman Passban, Ankit Chadha
- Abstract summary: Training neural machine translation (NMT) models in federated learning (FL) settings could be inefficient both computationally and communication-wise.
In this paper, we explore how to efficiently build NMT models in an FL setup by proposing a novel solution.
In order to reduce the communication overhead, out of all neural layers we only exchange what we term "Controller" layers.
- Score: 1.5362025549031046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training neural machine translation (NMT) models in federated learning (FL)
settings could be inefficient both computationally and communication-wise, due
to the large size of translation engines as well as the multiple rounds of
updates required to train clients and a central server. In this paper, we
explore how to efficiently build NMT models in an FL setup by proposing a novel
solution. In order to reduce the communication overhead, out of all neural
layers we only exchange what we term "Controller" layers. Controllers are a
small number of additional neural components connected to our pre-trained
architectures. These new components are placed in between original layers. They
act as liaisons to communicate with the central server and learn minimal
information that is sufficient enough to update clients.
We evaluated the performance of our models on five datasets from different
domains to translate from German into English. We noted that the models
equipped with Controllers preform on par with those trained in a central and
non-FL setting. In addition, we observed a substantial reduction in the
communication traffic of the FL pipeline, which is a direct consequence of
using Controllers. Based on our experiments, Controller-based models are ~6
times less expensive than their other peers. This reduction is significantly
important when we consider the number of parameters in large models and it
becomes even more critical when such parameters need to be exchanged for
multiple rounds in FL settings.
Related papers
- SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Communication Efficient ConFederated Learning: An Event-Triggered SAGA
Approach [67.27031215756121]
Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data over various data sources.
Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability.
In this work, we consider a multi-server FL framework, referred to as emphConfederated Learning (CFL) in order to accommodate a larger number of users.
arXiv Detail & Related papers (2024-02-28T03:27:10Z) - FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency
Trade-off in Language Model Inference [57.119047493787185]
This paper shows how to reduce model size by 43.1% and bring $1.25sim1.56times$ wall clock time speedup on different hardware with negligible accuracy drop.
In practice, our method can reduce model size by 43.1% and bring $1.25sim1.56times$ wall clock time speedup on different hardware with negligible accuracy drop.
arXiv Detail & Related papers (2024-01-08T17:29:16Z) - Toward efficient resource utilization at edge nodes in federated learning [0.6990493129893112]
Federated learning enables edge nodes to collaboratively contribute to constructing a global model without sharing their data.
computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications.
We propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices.
arXiv Detail & Related papers (2023-09-19T07:04:50Z) - Communication Efficient Federated Learning for Multilingual Neural
Machine Translation with Adapter [21.512817959760007]
Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources.
This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training.
However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck.
We propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients.
arXiv Detail & Related papers (2023-05-21T12:48:38Z) - Federated Nearest Neighbor Machine Translation [66.8765098651988]
In this paper, we propose a novel federated nearest neighbor (FedNN) machine translation framework.
FedNN leverages one-round memorization-based interaction to share knowledge across different clients.
Experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg.
arXiv Detail & Related papers (2023-02-23T18:04:07Z) - Training Mixed-Domain Translation Models via Federated Learning [16.71888086947849]
In this work, we leverage federated learning (FL) in order to tackle the problem of training mixed-domain translation models.
With slight modifications in the training process, neural machine translation (NMT) engines can be easily adapted when an FL-based aggregation is applied to fuse different domains.
We propose a novel technique to dynamically control the communication bandwidth by selecting impactful parameters during FL updates.
arXiv Detail & Related papers (2022-05-03T15:16:51Z) - Communication-Efficient Federated Learning with Binary Neural Networks [15.614120327271557]
Federated learning (FL) is a privacy-preserving machine learning setting.
FL involves a frequent exchange of the parameters between all the clients and the server that coordinates the training.
In this paper, we consider training the binary neural networks (BNN) in the FL setting instead of the typical real-valued neural networks.
arXiv Detail & Related papers (2021-10-05T15:59:49Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Wireless Communications for Collaborative Federated Learning [160.82696473996566]
Internet of Things (IoT) devices may not be able to transmit their collected data to a central controller for training machine learning models.
Google's seminal FL algorithm requires all devices to be directly connected with a central controller.
This paper introduces a novel FL framework, called collaborative FL (CFL), which enables edge devices to implement FL with less reliance on a central controller.
arXiv Detail & Related papers (2020-06-03T20:00:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.