Only Send What You Need: Learning to Communicate Efficiently in
Federated Multilingual Machine Translation
- URL: http://arxiv.org/abs/2401.07456v1
- Date: Mon, 15 Jan 2024 04:04:26 GMT
- Title: Only Send What You Need: Learning to Communicate Efficiently in
Federated Multilingual Machine Translation
- Authors: Yun-Wei Chu, Dong-Jun Han, Christopher G. Brinton
- Abstract summary: Federated learning (FL) is a promising approach for solving multilingual tasks.
We propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions.
We demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.
- Score: 19.28500206536013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning (FL) is a promising approach for solving multilingual
tasks, potentially enabling clients with their own language-specific data to
collaboratively construct a high-quality neural machine translation (NMT)
model. However, communication constraints in practical network systems present
challenges for exchanging large-scale NMT engines between FL parties. In this
paper, we propose a meta-learning-based adaptive parameter selection
methodology, MetaSend, that improves the communication efficiency of model
transmissions from clients during FL-based multilingual NMT training. Our
approach learns a dynamic threshold for filtering parameters prior to
transmission without compromising the NMT model quality, based on the tensor
deviations of clients between different FL rounds. Through experiments on two
NMT datasets with different language distributions, we demonstrate that
MetaSend obtains substantial improvements over baselines in translation quality
in the presence of a limited communication budget.
Related papers
- LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation [43.26446958873554]
Large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
LandeRMT is a framework that selectively finetunes LLMs to textbfMachine textbfTranslation with diverse translation training data.
arXiv Detail & Related papers (2024-09-29T02:39:42Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Communication Efficient Federated Learning for Multilingual Neural
Machine Translation with Adapter [21.512817959760007]
Federated Multilingual Neural Machine Translation (Fed-MNMT) has emerged as a promising paradigm for institutions with limited language resources.
This approach allows multiple institutions to act as clients and train a unified model through model synchronization, rather than collecting sensitive data for centralized training.
However, as pre-trained language models (PLMs) continue to increase in size, the communication cost for transmitting parameters during synchronization has become a training speed bottleneck.
We propose a communication-efficient Fed-MNMT framework that addresses this issue by keeping PLMs frozen and only transferring lightweight adapter modules between clients.
arXiv Detail & Related papers (2023-05-21T12:48:38Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Training Mixed-Domain Translation Models via Federated Learning [16.71888086947849]
In this work, we leverage federated learning (FL) in order to tackle the problem of training mixed-domain translation models.
With slight modifications in the training process, neural machine translation (NMT) engines can be easily adapted when an FL-based aggregation is applied to fuse different domains.
We propose a novel technique to dynamically control the communication bandwidth by selecting impactful parameters during FL updates.
arXiv Detail & Related papers (2022-05-03T15:16:51Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Towards Reinforcement Learning for Pivot-based Neural Machine
Translation with Non-autoregressive Transformer [49.897891031932545]
Pivot-based neural machine translation (NMT) is commonly used in low-resource setups.
We present an end-to-end pivot-based integrated model, enabling training on source-target data.
arXiv Detail & Related papers (2021-09-27T14:49:35Z) - Multi-task Learning for Multilingual Neural Machine Translation [32.81785430242313]
We propose a multi-task learning framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data.
We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages.
arXiv Detail & Related papers (2020-10-06T06:54:12Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining [37.2106265998237]
We propose an effective learning procedure named Meta Fine-Tuning (MFT)
MFT serves as a meta-learner to solve a group of similar NLP tasks for neural language models.
We implement MFT upon BERT to solve several multi-domain text mining tasks.
arXiv Detail & Related papers (2020-03-29T11:27:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.