Related papers: Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders

Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders

URL: http://arxiv.org/abs/2006.01594v1
Date: Fri, 29 May 2020 19:00:59 GMT
Title: Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders
Authors: Carlos Escolano, Marta R. Costa-juss\`a, Jos\'e A. R. Fonollosa and Mikel Artetxe
Abstract summary: We propose a multilingual machine translation system that can be incrementally extended to new languages without retraining the existing system when adding new languages. We simultaneously train $N$ languages in all translation directions by alternately freezing encoder or decoder modules. Experimental results from multilingual machine translation show that we can successfully train this modular architecture improving on the initial languages while falling slightly behind when adding new languages or doing zero-shot translation.
Score: 20.063065730835874
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages. Differently from previous works, we simultaneously train $N$ languages in all translation directions by alternately freezing encoder or decoder modules, which indirectly forces the system to train in a common intermediate representation for all languages. Experimental results from multilingual machine translation show that we can successfully train this modular architecture improving on the initial languages while falling slightly behind when adding new languages or doing zero-shot translation. Additional comparison of the quality of sentence representation in the task of natural language inference shows that the alternately freezing training is also beneficial in this direction.

Related papers

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space. In this work, we test this hypothesis by zero-shot translating from unseen languages. We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z)
Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features [18.76505158652759]
We propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation. On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features. On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation.
arXiv Detail & Related papers (2024-08-02T17:10:12Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations. We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z)
Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages [12.00637655338665]
We study very low-resource languages and handle 50 African languages, many of which are not covered by any other model. We train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems. For these languages, we train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
arXiv Detail & Related papers (2022-05-25T10:53:24Z)
Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data. Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z)
Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs) Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z)
Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages. We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data. Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z)
Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution [16.939016405962526]
We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models. Our approach is suitable for large-scale datasets, applies to distant languages with unseen scripts and incurs only minor degradation on the translation performance for the original language pairs.
arXiv Detail & Related papers (2021-03-11T17:10:21Z)
Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders [20.063065730835874]
State-of-the-art multilingual machine translation relies on a universal encoder-decoder. We propose an alternative approach that is based on language-specific encoder-decoders.
arXiv Detail & Related papers (2020-04-14T15:02:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.