Joint Dropout: Improving Generalizability in Low-Resource Neural Machine
Translation through Phrase Pair Variables
- URL: http://arxiv.org/abs/2307.12835v1
- Date: Mon, 24 Jul 2023 14:33:49 GMT
- Title: Joint Dropout: Improving Generalizability in Low-Resource Neural Machine
Translation through Phrase Pair Variables
- Authors: Ali Araabi, Vlad Niculae, Christof Monz
- Abstract summary: We propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables.
We observe a substantial improvement in translation quality for language pairs with minimal resources, as seen in BLEU and Direct Assessment scores.
- Score: 17.300004156754966
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Despite the tremendous success of Neural Machine Translation (NMT), its
performance on low-resource language pairs still remains subpar, partly due to
the limited ability to handle previously unseen inputs, i.e., generalization.
In this paper, we propose a method called Joint Dropout, that addresses the
challenge of low-resource neural machine translation by substituting phrases
with variables, resulting in significant enhancement of compositionality, which
is a key aspect of generalization. We observe a substantial improvement in
translation quality for language pairs with minimal resources, as seen in BLEU
and Direct Assessment scores. Furthermore, we conduct an error analysis, and
find Joint Dropout to also enhance generalizability of low-resource NMT in
terms of robustness and adaptability across different domains
Related papers
- Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That! [13.120825574589437]
We show that Transformer-based neural machine translation (NMT) is very effective in high-resource settings.
We show that the model does not show greater improvements for closely-related vs. more distant language pairs.
Our discussion of the reasons for this behaviour highlights several general challenges for LR NMT.
arXiv Detail & Related papers (2024-03-16T16:17:47Z) - Relevance-guided Neural Machine Translation [5.691028372215281]
We propose an explainability-based training approach for Neural Machine Translation (NMT)
Our results show our method can be promising, particularly when training in low-resource conditions.
arXiv Detail & Related papers (2023-11-30T21:52:02Z) - Semi-supervised Neural Machine Translation with Consistency
Regularization for Low-Resource Languages [3.475371300689165]
This paper presents a simple yet effective method to tackle the problem for low-resource languages by augmenting high-quality sentence pairs and training NMT models in a semi-supervised manner.
Specifically, our approach combines the cross-entropy loss for supervised learning with KL Divergence for unsupervised fashion given pseudo and augmented target sentences.
Experimental results show that our approach significantly improves NMT baselines, especially on low-resource datasets with 0.46--2.03 BLEU scores.
arXiv Detail & Related papers (2023-04-02T15:24:08Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - Categorizing Semantic Representations for Neural Machine Translation [53.88794787958174]
We introduce categorization to the source contextualized representations.
The main idea is to enhance generalization by reducing sparsity and overfitting.
Experiments on a dedicated MT dataset show that our method reduces compositional generalization error rates by 24% error reduction.
arXiv Detail & Related papers (2022-10-13T04:07:08Z) - On the Complementarity between Pre-Training and Random-Initialization
for Resource-Rich Machine Translation [80.16548523140025]
Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI.
Experiments on two resource-rich translation benchmarks, WMT'17 English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI could be nicely complementary to each other.
arXiv Detail & Related papers (2022-09-07T17:23:08Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Phrase-level Adversarial Example Generation for Neural Machine
Translation [75.01476479100569]
We propose a phrase-level adversarial example generation (PAEG) method to enhance the robustness of the model.
We verify our method on three benchmarks, including LDC Chinese-English, IWSLT14 German-English, and WMT14 English-German tasks.
arXiv Detail & Related papers (2022-01-06T11:00:49Z) - The Low-Resource Double Bind: An Empirical Study of Pruning for
Low-Resource Machine Translation [8.2987165990395]
"Bigger is better" explosion in number of parameters in deep neural networks has made it increasingly challenging to make state-of-the-art networks accessible in compute-restricted environments.
"Low-resource double bind" refers to co-occurrence of data limitations and compute resource constraints.
Our work offers surprising insights into the relationship between capacity and generalization in data-limited regimes for the task of machine translation.
arXiv Detail & Related papers (2021-10-06T19:48:18Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.