Enhancing Supervised Learning with Contrastive Markings in Neural
Machine Translation Training
- URL: http://arxiv.org/abs/2307.08416v1
- Date: Mon, 17 Jul 2023 11:56:32 GMT
- Title: Enhancing Supervised Learning with Contrastive Markings in Neural
Machine Translation Training
- Authors: Nathaniel Berger, Miriam Exel, Matthias Huck and Stefan Riezler
- Abstract summary: Supervised learning in Neural Machine Translation (NMT) typically follows a teacher forcing paradigm.
We present a simple extension of standard maximum likelihood estimation by a contrastive marking objective.
We show that training with contrastive markings yields improvements on top of supervised learning.
- Score: 10.498938255717066
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Supervised learning in Neural Machine Translation (NMT) typically follows a
teacher forcing paradigm where reference tokens constitute the conditioning
context in the model's prediction, instead of its own previous predictions. In
order to alleviate this lack of exploration in the space of translations, we
present a simple extension of standard maximum likelihood estimation by a
contrastive marking objective. The additional training signals are extracted
automatically from reference translations by comparing the system hypothesis
against the reference, and used for up/down-weighting correct/incorrect tokens.
The proposed new training procedure requires one additional translation pass
over the training set per epoch, and does not alter the standard inference
setup. We show that training with contrastive markings yields improvements on
top of supervised learning, and is especially useful when learning from
postedits where contrastive markings indicate human error corrections to the
original hypotheses. Code is publicly released.
Related papers
- Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance [68.56701216210617]
In-principle, one would expect models to adapt to the user context better after instruction finetuning.
We observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases.
arXiv Detail & Related papers (2024-10-14T17:57:09Z) - Dynamic Scheduled Sampling with Imitation Loss for Neural Text
Generation [10.306522595622651]
We introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy.
DySI achieves notable improvements on standard machine translation benchmarks, and significantly improves the robustness of other text generation models.
arXiv Detail & Related papers (2023-01-31T16:41:06Z) - Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg)
ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning.
We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z) - Understanding and Improving Sequence-to-Sequence Pretraining for Neural
Machine Translation [48.50842995206353]
We study the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT.
We propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies.
arXiv Detail & Related papers (2022-03-16T07:36:28Z) - Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic
Weight Consolidation in Neural Machine Translation [15.581515781839656]
Autoregressive models trained with maximum likelihood estimation suffer from exposure bias.
We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality.
Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU.
arXiv Detail & Related papers (2021-09-13T20:37:58Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Correct Me If You Can: Learning from Error Corrections and Markings [20.808561880051148]
We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings.
We show that error markings for TED talks from English to German translations allow precise credit assignment while requiring significantly less human effort than correcting/post-editing.
arXiv Detail & Related papers (2020-04-23T15:17:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.