Training Multilingual Machine Translation by Alternately Freezing
Language-Specific Encoders-Decoders
- URL: http://arxiv.org/abs/2006.01594v1
- Date: Fri, 29 May 2020 19:00:59 GMT
- Title: Training Multilingual Machine Translation by Alternately Freezing
Language-Specific Encoders-Decoders
- Authors: Carlos Escolano, Marta R. Costa-juss\`a, Jos\'e A. R. Fonollosa and
Mikel Artetxe
- Abstract summary: We propose a multilingual machine translation system that can be incrementally extended to new languages without retraining the existing system when adding new languages.
We simultaneously train $N$ languages in all translation directions by alternately freezing encoder or decoder modules.
Experimental results from multilingual machine translation show that we can successfully train this modular architecture improving on the initial languages while falling slightly behind when adding new languages or doing zero-shot translation.
- Score: 20.063065730835874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a modular architecture of language-specific encoder-decoders that
constitutes a multilingual machine translation system that can be incrementally
extended to new languages without the need for retraining the existing system
when adding new languages. Differently from previous works, we simultaneously
train $N$ languages in all translation directions by alternately freezing
encoder or decoder modules, which indirectly forces the system to train in a
common intermediate representation for all languages. Experimental results from
multilingual machine translation show that we can successfully train this
modular architecture improving on the initial languages while falling slightly
behind when adding new languages or doing zero-shot translation. Additional
comparison of the quality of sentence representation in the task of natural
language inference shows that the alternately freezing training is also
beneficial in this direction.
Related papers
- MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling [70.34758460372629]
We introduce a new paradigm that encodes the same information with segments of consistent size across diverse languages.
MYTE produces shorter encodings for all 99 analyzed languages.
This, in turn, improves multilingual LM performance and diminishes the perplexity gap throughout diverse languages.
arXiv Detail & Related papers (2024-03-15T21:21:11Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Informative Language Representation Learning for Massively Multilingual
Neural Machine Translation [47.19129812325682]
In a multilingual neural machine translation model, an artificial language token is usually used to guide translation into the desired target language.
Recent studies show that prepending language tokens sometimes fails to navigate the multilingual neural machine translation models into right translation directions.
We propose two methods, language embedding embodiment and language-aware multi-head attention, to learn informative language representations to channel translation into right directions.
arXiv Detail & Related papers (2022-09-04T04:27:17Z) - Bitext Mining Using Distilled Sentence Representations for Low-Resource
Languages [12.00637655338665]
We study very low-resource languages and handle 50 African languages, many of which are not covered by any other model.
We train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
For these languages, we train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
arXiv Detail & Related papers (2022-05-25T10:53:24Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages.
We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data.
Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z) - Towards Continual Learning for Multilingual Machine Translation via
Vocabulary Substitution [16.939016405962526]
We propose a straightforward vocabulary adaptation scheme to extend the language capacity of multilingual machine translation models.
Our approach is suitable for large-scale datasets, applies to distant languages with unseen scripts and incurs only minor degradation on the translation performance for the original language pairs.
arXiv Detail & Related papers (2021-03-11T17:10:21Z) - Improving Zero-shot Neural Machine Translation on Language-specific
Encoders-Decoders [19.44855809470709]
Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation.
Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules.
We study zero-shot translation using language-specific encoders-decoders.
arXiv Detail & Related papers (2021-02-12T15:36:33Z) - Multilingual Machine Translation: Closing the Gap between Shared and
Language-specific Encoder-Decoders [20.063065730835874]
State-of-the-art multilingual machine translation relies on a universal encoder-decoder.
We propose an alternative approach that is based on language-specific encoder-decoders.
arXiv Detail & Related papers (2020-04-14T15:02:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.