Crosslingual Embeddings are Essential in UNMT for Distant Languages: An
English to IndoAryan Case Study
- URL: http://arxiv.org/abs/2106.04995v1
- Date: Wed, 9 Jun 2021 11:31:27 GMT
- Title: Crosslingual Embeddings are Essential in UNMT for Distant Languages: An
English to IndoAryan Case Study
- Authors: Tamali Banerjee, Rudra Murthy V, Pushpak Bhattacharyya
- Abstract summary: We show that initializing the embedding layer of UNMT models with cross-lingual embeddings shows significant improvements in BLEU score over existing approaches.
We experimented using Masked Sequence to Sequence (MASS) and Denoising Autoencoder (DAE) UNMT approaches for three distant language pairs.
- Score: 28.409618457653135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Unsupervised Neural Machine Translation (UNMT) have
minimized the gap between supervised and unsupervised machine translation
performance for closely related language pairs. However, the situation is very
different for distant language pairs. Lack of lexical overlap and low syntactic
similarities such as between English and Indo-Aryan languages leads to poor
translation quality in existing UNMT systems. In this paper, we show that
initializing the embedding layer of UNMT models with cross-lingual embeddings
shows significant improvements in BLEU score over existing approaches with
embeddings randomly initialized. Further, static embeddings (freezing the
embedding layer weights) lead to better gains compared to updating the
embedding layer weights during training (non-static). We experimented using
Masked Sequence to Sequence (MASS) and Denoising Autoencoder (DAE) UNMT
approaches for three distant language pairs. The proposed cross-lingual
embedding initialization yields BLEU score improvement of as much as ten times
over the baseline for English-Hindi, English-Bengali, and English-Gujarati. Our
analysis shows the importance of cross-lingual embedding, comparisons between
approaches, and the scope of improvements in these systems.
Related papers
- LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation [43.26446958873554]
Large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
LandeRMT is a framework that selectively finetunes LLMs to textbfMachine textbfTranslation with diverse translation training data.
arXiv Detail & Related papers (2024-09-29T02:39:42Z) - Probing the Emergence of Cross-lingual Alignment during LLM Training [10.053333786023089]
Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance.
We study how such cross-lingual alignment emerges during pre-training of LLMs.
We observe a high correlation between neuron overlap and downstream performance.
arXiv Detail & Related papers (2024-06-19T05:31:59Z) - Extending Multilingual Machine Translation through Imitation Learning [60.15671816513614]
Imit-MNMT treats the task as an imitation learning process, which mimicks the behavior of an expert.
We show that our approach significantly improves the translation performance between the new and the original languages.
We also demonstrate that our approach is capable of solving copy and off-target problems.
arXiv Detail & Related papers (2023-11-14T21:04:03Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Denoising-based UNMT is more robust to word-order divergence than
MASS-based UNMT [27.85834441076481]
We investigate whether UNMT approaches with self-supervised pre-training are robust to word-order divergence between language pairs.
We compare two models pre-trained with the same self-supervised pre-training objective.
We observe that DAE-based UNMT approach consistently outperforms MASS in terms of translation accuracies.
arXiv Detail & Related papers (2023-03-02T12:11:58Z) - Revamping Multilingual Agreement Bidirectionally via Switched
Back-translation for Multilingual Neural Machine Translation [107.83158521848372]
multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT)
We present textbfBidirectional textbfMultilingual textbfAgreement via textbfSwitched textbfBack-textbftranslation (textbfBMA-SBT)
It is a novel and universal multilingual agreement framework for fine-tuning pre-trained MNMT models.
arXiv Detail & Related papers (2022-09-28T09:14:58Z) - Non-Autoregressive Neural Machine Translation: A Call for Clarity [3.1447111126465]
We take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models.
We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants.
We contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks.
arXiv Detail & Related papers (2022-05-21T12:15:22Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.