Reference Language based Unsupervised Neural Machine Translation
- URL: http://arxiv.org/abs/2004.02127v2
- Date: Fri, 9 Oct 2020 15:48:59 GMT
- Title: Reference Language based Unsupervised Neural Machine Translation
- Authors: Zuchao Li, Hai Zhao, Rui Wang, Masao Utiyama, Eiichiro Sumita
- Abstract summary: unsupervised neural machine translation almost completely relieves the parallel corpus curse.
We propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source.
Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language.
- Score: 108.64894168968067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploiting a common language as an auxiliary for better translation has a
long tradition in machine translation and lets supervised learning-based
machine translation enjoy the enhancement delivered by the well-used pivot
language in the absence of a source language to target language parallel
corpus. The rise of unsupervised neural machine translation (UNMT) almost
completely relieves the parallel corpus curse, though UNMT is still subject to
unsatisfactory performance due to the vagueness of the clues available for its
core back-translation training. Further enriching the idea of pivot translation
by extending the use of parallel corpora beyond the source-target paradigm, we
propose a new reference language-based framework for UNMT, RUNMT, in which the
reference language only shares a parallel corpus with the source, but this
corpus still indicates a signal clear enough to help the reconstruction
training of UNMT through a proposed reference agreement mechanism. Experimental
results show that our methods improve the quality of UNMT over that of a strong
baseline that uses only one auxiliary language, demonstrating the usefulness of
the proposed reference language-based UNMT and establishing a good start for
the community.
Related papers
- BiVert: Bidirectional Vocabulary Evaluation using Relations for Machine
Translation [4.651581292181871]
We propose a bidirectional semantic-based evaluation method designed to assess the sense distance of the translation from the source text.
This approach employs the comprehensive multilingual encyclopedic dictionary BabelNet.
Factual analysis shows a strong correlation between the average evaluation scores generated by our method and the human assessments across various machine translation systems for English-German language pair.
arXiv Detail & Related papers (2024-03-06T08:02:21Z) - Cross-lingual neural fuzzy matching for exploiting target-language
monolingual corpora in computer-aided translation [0.0]
In this paper, we introduce a novel neural approach aimed at exploiting in-domain target-language (TL) monolingual corpora.
Our approach relies on cross-lingual sentence embeddings to retrieve translation proposals from TL monolingual corpora, and on a neural model to estimate their post-editing effort.
The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment.
arXiv Detail & Related papers (2024-01-16T14:00:28Z) - Extending Multilingual Machine Translation through Imitation Learning [60.15671816513614]
Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert.
We show that our approach significantly improves the translation performance between the new and the original languages.
We also demonstrate that our approach is capable of solving copy and off-target problems.
arXiv Detail & Related papers (2023-11-14T21:04:03Z) - Boosting Unsupervised Machine Translation with Pseudo-Parallel Data [2.900810893770134]
We propose a training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora and synthetic sentence pairs back-translated from monolingual corpora.
We reach an improvement of up to 14.5 BLEU points (English to Ukrainian) over a baseline trained on back-translated data only.
arXiv Detail & Related papers (2023-10-22T10:57:12Z) - Self-supervised and Supervised Joint Training for Resource-rich Machine
Translation [30.502625878505732]
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose a joint training approach, $F$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models.
arXiv Detail & Related papers (2021-06-08T02:35:40Z) - Self-Training for Unsupervised Neural Machine Translation in Unbalanced
Training Data Scenarios [61.88012735215636]
Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks.
In real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian.
We propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance.
arXiv Detail & Related papers (2020-04-09T12:07:17Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z) - Learning Contextualized Sentence Representations for Document-Level
Neural Machine Translation [59.191079800436114]
Document-level machine translation incorporates inter-sentential dependencies into the translation of a source sentence.
We propose a new framework to model cross-sentence dependencies by training neural machine translation (NMT) to predict both the target translation and surrounding sentences of a source sentence.
arXiv Detail & Related papers (2020-03-30T03:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.