Related papers: Exploiting Curriculum Learning in Unsupervised Neural Machine Translation

Exploiting Curriculum Learning in Unsupervised Neural Machine Translation

URL: http://arxiv.org/abs/2109.11177v1
Date: Thu, 23 Sep 2021 07:18:06 GMT
Title: Exploiting Curriculum Learning in Unsupervised Neural Machine Translation
Authors: Jinliang Lu and Jiajun Zhang
Abstract summary: We propose a curriculum learning method to gradually utilize pseudo bi-texts based on their quality from multiple granularities. Experimental results on WMT 14 En-Fr, WMT 16 En-De, WMT 16 En-Ro, and LDC En-Zh translation tasks demonstrate that the proposed method achieves consistent improvements with faster convergence speed.
Score: 28.75229367700697
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Back-translation (BT) has become one of the de facto components in unsupervised neural machine translation (UNMT), and it explicitly makes UNMT have translation ability. However, all the pseudo bi-texts generated by BT are treated equally as clean data during optimization without considering the quality diversity, leading to slow convergence and limited translation performance. To address this problem, we propose a curriculum learning method to gradually utilize pseudo bi-texts based on their quality from multiple granularities. Specifically, we first apply cross-lingual word embedding to calculate the potential translation difficulty (quality) for the monolingual sentences. Then, the sentences are fed into UNMT from easy to hard batch by batch. Furthermore, considering the quality of sentences/tokens in a particular batch are also diverse, we further adopt the model itself to calculate the fine-grained quality scores, which are served as learning factors to balance the contributions of different parts when computing loss and encourage the UNMT model to focus on pseudo data with higher quality. Experimental results on WMT 14 En-Fr, WMT 16 En-De, WMT 16 En-Ro, and LDC En-Zh translation tasks demonstrate that the proposed method achieves consistent improvements with faster convergence speed.

Related papers

A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations [0.4499833362998489]
This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy. To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations. Results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT.
arXiv Detail & Related papers (2024-09-04T13:49:45Z)
TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks. We propose the TasTe framework, which stands for translating through self-reflection. The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z)
Active Learning for Neural Machine Translation [0.0]
We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation. This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM), active learning least confidence based model (ALLCM) and active learning margin sampling based model (ALMSM) when translating English to Hindi.
arXiv Detail & Related papers (2022-12-30T17:04:01Z)
Competency-Aware Neural Machine Translation: Can Machine Translation Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness. We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator. We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z)
DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages [5.367993194110256]
DivEMT is the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. We assess the impact on translation productivity of two state-of-the-art NMT systems, namely: Google Translate and the open-source multilingual model mBART50.
arXiv Detail & Related papers (2022-05-24T17:22:52Z)
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT) CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z)
DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z)
Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model. We evaluate our novel latent translation-based model on a series of multilingual NLU tasks. We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z)
Phrase-level Active Learning for Neural Machine Translation [107.28450614074002]
We propose an active learning setting where we can spend a given budget on translating in-domain data. We select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators. In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods.
arXiv Detail & Related papers (2021-06-21T19:20:42Z)
Verdi: Quality Estimation and Error Detection for Bilingual [23.485380293716272]
Verdi is a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor. Our method beats the winner of the competition and outperforms other baseline methods by a great margin.
arXiv Detail & Related papers (2021-05-31T11:04:13Z)
Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings [51.47607125262885]
We describe an unsupervised method to create pseudo-parallel corpora for machine translation (MT) from unaligned text. We use multilingual BERT to create source and target sentence embeddings for nearest-neighbor search and adapt the model via self-training. We validate our technique by extracting parallel sentence pairs on the BUCC 2017 bitext mining task and observe up to a 24.5 point increase (absolute) in F1 scores over previous unsupervised methods.
arXiv Detail & Related papers (2020-10-15T14:04:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.