Related papers: Communication-Efficient Federated Learning for Neural Machine Translation

Communication-Efficient Federated Learning for Neural Machine Translation

URL: http://arxiv.org/abs/2112.06135v1
Date: Sun, 12 Dec 2021 03:16:03 GMT
Title: Communication-Efficient Federated Learning for Neural Machine Translation
Authors: Tanya Roosta, Peyman Passban, Ankit Chadha
Abstract summary: Training neural machine translation (NMT) models in federated learning (FL) settings could be inefficient both computationally and communication-wise. In this paper, we explore how to efficiently build NMT models in an FL setup by proposing a novel solution. In order to reduce the communication overhead, out of all neural layers we only exchange what we term "Controller" layers.
Score: 1.5362025549031046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training neural machine translation (NMT) models in federated learning (FL) settings could be inefficient both computationally and communication-wise, due to the large size of translation engines as well as the multiple rounds of updates required to train clients and a central server. In this paper, we explore how to efficiently build NMT models in an FL setup by proposing a novel solution. In order to reduce the communication overhead, out of all neural layers we only exchange what we term "Controller" layers. Controllers are a small number of additional neural components connected to our pre-trained architectures. These new components are placed in between original layers. They act as liaisons to communicate with the central server and learn minimal information that is sufficient enough to update clients. We evaluated the performance of our models on five datasets from different domains to translate from German into English. We noted that the models equipped with Controllers preform on par with those trained in a central and non-FL setting. In addition, we observed a substantial reduction in the communication traffic of the FL pipeline, which is a direct consequence of using Controllers. Based on our experiments, Controller-based models are ~6 times less expensive than their other peers. This reduction is significantly important when we consider the number of parameters in large models and it becomes even more critical when such parameters need to be exchanged for multiple rounds in FL settings.

Related papers

Communication-Efficient Federated Learning Based on Explanation-Guided Pruning for Remote Sensing Image Classification [2.725507329935916]
We introduce an explanation-guided pruning strategy for communication-efficient Federated Learning (FL) Our strategy effectively reduces the number of shared model updates, while increasing the ability of the global model. The code of this work will be publicly available at https://git.tu-berlin.de/rsim/FL-LRP.
arXiv Detail & Related papers (2025-01-20T13:59:41Z)
SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z)
Communication Efficient ConFederated Learning: An Event-Triggered SAGA Approach [67.27031215756121]
Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data over various data sources. Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability. In this work, we consider a multi-server FL framework, referred to as emphConfederated Learning (CFL) in order to accommodate a larger number of users.
arXiv Detail & Related papers (2024-02-28T03:27:10Z)
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation [17.159005029204092]
This paper focuses on a practical federated multilingual learning setup where clients with their own language-specific data aim to collaboratively construct a high-quality neural machine translation (NMT) model. We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions from clients during FL-based multilingual NMT training.
arXiv Detail & Related papers (2024-01-15T04:04:26Z)
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference [57.119047493787185]
This paper shows how to reduce model size by 43.1% and bring $1.25sim1.56times$ wall clock time speedup on different hardware with negligible accuracy drop. In practice, our method can reduce model size by 43.1% and bring $1.25sim1.56times$ wall clock time speedup on different hardware with negligible accuracy drop.
arXiv Detail & Related papers (2024-01-08T17:29:16Z)
Toward efficient resource utilization at edge nodes in federated learning [0.6990493129893112]
Federated learning enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. We propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices.
arXiv Detail & Related papers (2023-09-19T07:04:50Z)
Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter [21.512817959760007]
Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources. This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training. However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck. We propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients.
arXiv Detail & Related papers (2023-05-21T12:48:38Z)
Federated Nearest Neighbor Machine Translation [66.8765098651988]
In this paper, we propose a novel federated nearest neighbor (FedNN) machine translation framework. FedNN leverages one-round memorization-based interaction to share knowledge across different clients. Experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg.
arXiv Detail & Related papers (2023-02-23T18:04:07Z)
Training Mixed-Domain Translation Models via Federated Learning [16.71888086947849]
In this work, we leverage federated learning (FL) in order to tackle the problem of training mixed-domain translation models. With slight modifications in the training process, neural machine translation (NMT) engines can be easily adapted when an FL-based aggregation is applied to fuse different domains. We propose a novel technique to dynamically control the communication bandwidth by selecting impactful parameters during FL updates.
arXiv Detail & Related papers (2022-05-03T15:16:51Z)
Communication-Efficient Federated Learning with Binary Neural Networks [15.614120327271557]
Federated learning (FL) is a privacy-preserving machine learning setting. FL involves a frequent exchange of the parameters between all the clients and the server that coordinates the training. In this paper, we consider training the binary neural networks (BNN) in the FL setting instead of the typical real-valued neural networks.
arXiv Detail & Related papers (2021-10-05T15:59:49Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
Wireless Communications for Collaborative Federated Learning [160.82696473996566]
Internet of Things (IoT) devices may not be able to transmit their collected data to a central controller for training machine learning models. Google's seminal FL algorithm requires all devices to be directly connected with a central controller. This paper introduces a novel FL framework, called collaborative FL (CFL), which enables edge devices to implement FL with less reliance on a central controller.
arXiv Detail & Related papers (2020-06-03T20:00:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.