Iterative Domain-Repaired Back-Translation
- URL: http://arxiv.org/abs/2010.02473v1
- Date: Tue, 6 Oct 2020 04:38:09 GMT
- Title: Iterative Domain-Repaired Back-Translation
- Authors: Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo
- Abstract summary: In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent.
We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data.
Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
- Score: 50.32925322697343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we focus on the domain-specific translation with low
resources, where in-domain parallel corpora are scarce or nonexistent. One
common and effective strategy for this case is exploiting in-domain monolingual
data with the back-translation method. However, the synthetic parallel data is
very noisy because they are generated by imperfect out-of-domain systems,
resulting in the poor performance of domain adaptation. To address this issue,
we propose a novel iterative domain-repaired back-translation framework, which
introduces the Domain-Repair (DR) model to refine translations in synthetic
bilingual data. To this end, we construct corresponding data for the DR model
training by round-trip translating the monolingual sentences, and then design
the unified training framework to optimize paired DR and NMT models jointly.
Experiments on adapting NMT models between specific domains and from the
general domain to specific domains demonstrate the effectiveness of our
proposed approach, achieving 15.79 and 4.47 BLEU improvements on average over
unadapted models and back-translation.
Related papers
- Domain-Specific Text Generation for Machine Translation [7.803471587734353]
We propose a novel approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation.
We employ mixed fine-tuning to train models that significantly improve translation of in-domain texts.
arXiv Detail & Related papers (2022-08-11T16:22:16Z) - Finding the Right Recipe for Low Resource Domain Adaptation in Neural
Machine Translation [7.2283509416724465]
General translation models often struggle to generate accurate translations in specialized domains.
We conduct an in-depth empirical exploration of monolingual and parallel data approaches to domain adaptation.
Our work includes three domains: consumer electronic, clinical, and biomedical.
arXiv Detail & Related papers (2022-06-02T16:38:33Z) - Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z) - Non-Parametric Unsupervised Domain Adaptation for Neural Machine
Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval.
We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z) - Dynamic Data Selection and Weighting for Iterative Back-Translation [116.14378571769045]
We propose a curriculum learning strategy for iterative back-translation models.
We evaluate our models on domain adaptation, low-resource, and high-resource MT settings.
Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.
arXiv Detail & Related papers (2020-04-07T19:49:58Z) - Structured Domain Adaptation with Online Relation Regularization for
Unsupervised Person Re-ID [62.90727103061876]
Unsupervised domain adaptation (UDA) aims at adapting the model trained on a labeled source-domain dataset to an unlabeled target-domain dataset.
We propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.
Our proposed framework is shown to achieve state-of-the-art performance on multiple UDA tasks of person re-ID.
arXiv Detail & Related papers (2020-03-14T14:45:18Z) - A Simple Baseline to Semi-Supervised Domain Adaptation for Machine
Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data.
We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT.
This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.