Related papers: Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training

Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training

URL: http://arxiv.org/abs/2307.08416v1
Date: Mon, 17 Jul 2023 11:56:32 GMT
Title: Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training
Authors: Nathaniel Berger, Miriam Exel, Matthias Huck and Stefan Riezler
Abstract summary: Supervised learning in Neural Machine Translation (NMT) typically follows a teacher forcing paradigm. We present a simple extension of standard maximum likelihood estimation by a contrastive marking objective. We show that training with contrastive markings yields improvements on top of supervised learning.
Score: 10.498938255717066
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Supervised learning in Neural Machine Translation (NMT) typically follows a teacher forcing paradigm where reference tokens constitute the conditioning context in the model's prediction, instead of its own previous predictions. In order to alleviate this lack of exploration in the space of translations, we present a simple extension of standard maximum likelihood estimation by a contrastive marking objective. The additional training signals are extracted automatically from reference translations by comparing the system hypothesis against the reference, and used for up/down-weighting correct/incorrect tokens. The proposed new training procedure requires one additional translation pass over the training set per epoch, and does not alter the standard inference setup. We show that training with contrastive markings yields improvements on top of supervised learning, and is especially useful when learning from postedits where contrastive markings indicate human error corrections to the original hypotheses. Code is publicly released.

Related papers

Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance [68.56701216210617]
In-principle, one would expect models to adapt to the user context better after instruction finetuning. We observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases.
arXiv Detail & Related papers (2024-10-14T17:57:09Z)
Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation [10.306522595622651]
We introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy. DySI achieves notable improvements on standard machine translation benchmarks, and significantly improves the robustness of other text generation models.
arXiv Detail & Related papers (2023-01-31T16:41:06Z)
Debiased Fine-Tuning for Vision-language Models by Prompt Regularization [50.41984119504716]
We present a new paradigm for fine-tuning large-scale vision pre-trained models on downstream task, dubbed Prompt Regularization (ProReg) ProReg uses the prediction by prompting the pretrained model to regularize the fine-tuning. We show the consistently strong performance of ProReg compared with conventional fine-tuning, zero-shot prompt, prompt tuning, and other state-of-the-art methods.
arXiv Detail & Related papers (2023-01-29T11:53:55Z)
Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation [48.50842995206353]
We study the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. We propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies.
arXiv Detail & Related papers (2022-03-16T07:36:28Z)
Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation [15.581515781839656]
Autoregressive models trained with maximum likelihood estimation suffer from exposure bias. We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality. Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU.
arXiv Detail & Related papers (2021-09-13T20:37:58Z)
Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT) Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes. An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)
Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks. We introduce a new scoring method that casts a plausibility ranking task in a full-text format. We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
Correct Me If You Can: Learning from Error Corrections and Markings [20.808561880051148]
We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings. We show that error markings for TED talks from English to German translations allow precise credit assignment while requiring significantly less human effort than correcting/post-editing.
arXiv Detail & Related papers (2020-04-23T15:17:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.