Denoising-based UNMT is more robust to word-order divergence than
MASS-based UNMT
- URL: http://arxiv.org/abs/2303.01191v1
- Date: Thu, 2 Mar 2023 12:11:58 GMT
- Title: Denoising-based UNMT is more robust to word-order divergence than
MASS-based UNMT
- Authors: Tamali Banerjee, Rudra Murthy V, and Pushpak Bhattacharyya
- Abstract summary: We investigate whether UNMT approaches with self-supervised pre-training are robust to word-order divergence between language pairs.
We compare two models pre-trained with the same self-supervised pre-training objective.
We observe that DAE-based UNMT approach consistently outperforms MASS in terms of translation accuracies.
- Score: 27.85834441076481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We aim to investigate whether UNMT approaches with self-supervised
pre-training are robust to word-order divergence between language pairs. We
achieve this by comparing two models pre-trained with the same self-supervised
pre-training objective. The first model is trained on language pairs with
different word-orders, and the second model is trained on the same language
pairs with source language re-ordered to match the word-order of the target
language. Ideally, UNMT approaches which are robust to word-order divergence
should exhibit no visible performance difference between the two
configurations. In this paper, we investigate two such self-supervised
pre-training based UNMT approaches, namely Masked Sequence-to-Sequence
Pre-Training, (MASS) (which does not have shuffling noise) and Denoising
AutoEncoder (DAE), (which has shuffling noise).
We experiment with five English$\rightarrow$Indic language pairs, i.e.,
en-hi, en-bn, en-gu, en-kn, and en-ta) where word-order of the source language
is SVO (Subject-Verb-Object), and the word-order of the target languages is SOV
(Subject-Object-Verb). We observed that for these language pairs, DAE-based
UNMT approach consistently outperforms MASS in terms of translation accuracies.
Moreover, bridging the word-order gap using reordering improves the translation
accuracy of MASS-based UNMT models, while it cannot improve the translation
accuracy of DAE-based UNMT models. This observation indicates that DAE-based
UNMT is more robust to word-order divergence than MASS-based UNMT.
Word-shuffling noise in DAE approach could be the possible reason for the
approach being robust to word-order divergence.
Related papers
- Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Understanding and Improving Sequence-to-Sequence Pretraining for Neural
Machine Translation [48.50842995206353]
We study the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT.
We propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies.
arXiv Detail & Related papers (2022-03-16T07:36:28Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Crosslingual Embeddings are Essential in UNMT for Distant Languages: An
English to IndoAryan Case Study [28.409618457653135]
We show that initializing the embedding layer of UNMT models with cross-lingual embeddings shows significant improvements in BLEU score over existing approaches.
We experimented using Masked Sequence to Sequence (MASS) and Denoising Autoencoder (DAE) UNMT approaches for three distant language pairs.
arXiv Detail & Related papers (2021-06-09T11:31:27Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Uncertainty-Aware Semantic Augmentation for Neural Machine Translation [37.555675157198145]
We propose uncertainty-aware semantic augmentation, which explicitly captures the universal semantic information among multiple semantically-equivalent source sentences.
Our approach significantly outperforms the strong baselines and the existing methods.
arXiv Detail & Related papers (2020-10-09T07:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.