Prevent the Language Model from being Overconfident in Neural Machine
Translation
- URL: http://arxiv.org/abs/2105.11098v1
- Date: Mon, 24 May 2021 05:34:09 GMT
- Title: Prevent the Language Model from being Overconfident in Neural Machine
Translation
- Authors: Mengqi Miao, Fandong Meng, Yijin Liu, Xiao-Hua Zhou, Jie Zhou
- Abstract summary: We propose a Margin-based Objective (MTO) and a Margin-based Sentencelevel Objective (MSO) to maximize the Margin for preventing the LM from being overconfident.
Experiments on WMT14 English-to-German, WMT19 Chinese-to-English, and WMT14 English-to-French translation tasks demonstrate the effectiveness of our approach.
- Score: 21.203435303812043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Neural Machine Translation (NMT) model is essentially a joint language
model conditioned on both the source sentence and partial translation.
Therefore, the NMT model naturally involves the mechanism of the Language Model
(LM) that predicts the next token only based on partial translation. Despite
its success, NMT still suffers from the hallucination problem, generating
fluent but inadequate translations. The main reason is that NMT pays excessive
attention to the partial translation while neglecting the source sentence to
some extent, namely overconfidence of the LM. Accordingly, we define the Margin
between the NMT and the LM, calculated by subtracting the predicted probability
of the LM from that of the NMT model for each token. The Margin is negatively
correlated to the overconfidence degree of the LM. Based on the property, we
propose a Margin-based Token-level Objective (MTO) and a Margin-based
Sentencelevel Objective (MSO) to maximize the Margin for preventing the LM from
being overconfident. Experiments on WMT14 English-to-German, WMT19
Chinese-to-English, and WMT14 English-to-French translation tasks demonstrate
the effectiveness of our approach, with 1.36, 1.50, and 0.63 BLEU improvements,
respectively, compared to the Transformer baseline. The human evaluation
further verifies that our approaches improve translation adequacy as well as
fluency.
Related papers
- Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - An Empirical study of Unsupervised Neural Machine Translation: analyzing
NMT output, model's behavior and sentences' contribution [5.691028372215281]
Unsupervised Neural Machine Translation (UNMT) focuses on improving NMT results under the assumption there is no human translated parallel data.
We focus on three very diverse languages, French, Gujarati, and Kazakh, and train bilingual NMT models, to and from English, with various levels of supervision.
arXiv Detail & Related papers (2023-12-19T20:35:08Z) - Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality.
We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z) - Towards Reliable Neural Machine Translation with Consistency-Aware
Meta-Learning [24.64700139151659]
Current Neural machine translation (NMT) systems suffer from a lack of reliability.
We present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it.
We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task.
arXiv Detail & Related papers (2023-03-20T09:41:28Z) - Exploiting Language Relatedness in Machine Translation Through Domain
Adaptation Techniques [3.257358540764261]
We present a novel approach of using a scaled similarity score of sentences, especially for related languages based on a 5-gram KenLM language model.
Our approach succeeds in increasing 2 BLEU point on multi-domain approach, 3 BLEU point on fine-tuning for NMT and 2 BLEU point on iterative back-translation approach.
arXiv Detail & Related papers (2023-03-03T09:07:30Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT
Models for Code Generation [1.7616042687330642]
A key step to validate the robustness of the NMT models is to evaluate their performance on adversarial inputs.
In this work, we identify a set of perturbations and metrics tailored for the robustness assessment of such models.
We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most.
arXiv Detail & Related papers (2022-03-29T08:01:39Z) - On the Inference Calibration of Neural Machine Translation [54.48932804996506]
We study the correlation between calibration and translation performance and linguistic properties of miscalibration.
We propose a new graduated label smoothing method that can improve both inference calibration and translation performance.
arXiv Detail & Related papers (2020-05-03T02:03:56Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.