Rethinking Data Augmentation for Low-Resource Neural Machine
Translation: A Multi-Task Learning Approach
- URL: http://arxiv.org/abs/2109.03645v1
- Date: Wed, 8 Sep 2021 13:39:30 GMT
- Title: Rethinking Data Augmentation for Low-Resource Neural Machine
Translation: A Multi-Task Learning Approach
- Authors: V\'ictor M. S\'anchez-Cartagena, Miquel Espl\`a-Gomis, Juan Antonio
P\'erez-Ortiz, Felipe S\'anchez-Mart\'inez
- Abstract summary: Data augmentation (DA) techniques may be used for generating additional training samples when the available parallel data are scarce.
We present a multi-task DA approach in which we generate new sentence pairs with transformations.
We show consistent improvements over the baseline and over DA methods aiming at extending the support of the empirical data distribution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of neural machine translation, data augmentation (DA)
techniques may be used for generating additional training samples when the
available parallel data are scarce. Many DA approaches aim at expanding the
support of the empirical data distribution by generating new sentence pairs
that contain infrequent words, thus making it closer to the true data
distribution of parallel sentences. In this paper, we propose to follow a
completely different approach and present a multi-task DA approach in which we
generate new sentence pairs with transformations, such as reversing the order
of the target sentence, which produce unfluent target sentences. During
training, these augmented sentences are used as auxiliary tasks in a multi-task
framework with the aim of providing new contexts where the target prefix is not
informative enough to predict the next word. This strengthens the encoder and
forces the decoder to pay more attention to the source representations of the
encoder. Experiments carried out on six low-resource translation tasks show
consistent improvements over the baseline and over DA methods aiming at
extending the support of the empirical data distribution. The systems trained
with our approach rely more on the source tokens, are more robust against
domain shift and suffer less hallucinations.
Related papers
- Deterministic Reversible Data Augmentation for Neural Machine Translation [36.10695293724949]
We propose Deterministic Reversible Data Augmentation (DRDA), a simple but effective data augmentation method for neural machine translation.
With no extra corpora or model changes required, DRDA outperforms strong baselines on several translation tasks with a clear margin.
DRDA exhibits good robustness in noisy, low-resource, and cross-domain datasets.
arXiv Detail & Related papers (2024-06-04T17:39:23Z) - Curricular Transfer Learning for Sentence Encoded Tasks [0.0]
This article proposes a sequence of pre-training steps guided by "data hacking" and grammar analysis.
In our experiments, we acquire a considerable improvement from our method compared to other known pre-training approaches for the MultiWoZ task.
arXiv Detail & Related papers (2023-08-03T16:18:19Z) - Semi-supervised Neural Machine Translation with Consistency
Regularization for Low-Resource Languages [3.475371300689165]
This paper presents a simple yet effective method to tackle the problem for low-resource languages by augmenting high-quality sentence pairs and training NMT models in a semi-supervised manner.
Specifically, our approach combines the cross-entropy loss for supervised learning with KL Divergence for unsupervised fashion given pseudo and augmented target sentences.
Experimental results show that our approach significantly improves NMT baselines, especially on low-resource datasets with 0.46--2.03 BLEU scores.
arXiv Detail & Related papers (2023-04-02T15:24:08Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency.
In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information.
Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Ranking Creative Language Characteristics in Small Data Scenarios [52.00161818003478]
We adapt the DirectRanker to provide a new deep model for ranking creative language with small data.
Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small datasets, DirectRanker remains effective.
arXiv Detail & Related papers (2020-10-23T18:57:47Z) - Regularizing Deep Networks with Semantic Data Augmentation [44.53483945155832]
We propose a novel semantic data augmentation algorithm to complement traditional approaches.
The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features.
We show that the proposed implicit semantic data augmentation (ISDA) algorithm amounts to minimizing a novel robust CE loss.
arXiv Detail & Related papers (2020-07-21T00:32:44Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.