Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding
- URL: http://arxiv.org/abs/2309.07098v2
- Date: Mon, 29 Jan 2024 09:08:39 GMT
- Title: Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding
- Authors: Rico Sennrich and Jannis Vamvas and Alireza Mohammadshahi
- Abstract summary: We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
- Score: 53.84948040596055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hallucinations and off-target translation remain unsolved problems in MT,
especially for low-resource languages and massively multilingual models. In
this paper, we introduce two related methods to mitigate these failure cases
with a modified decoding objective, without either requiring retraining or
external models. In source-contrastive decoding, we search for a translation
that is probable given the correct input, but improbable given a random input
segment. In language-contrastive decoding, we search for a translation that is
probable, but improbable given the wrong language indicator token. Experiments
on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that
these methods suppress hallucinations and off-target translations, reducing the
number of translations with segment-level chrF2 below 10 by 67-83% on average,
and the number of translations with oscillatory hallucinations by 75-92% on
average, across 57 tested translation directions. In a proof of concept on
out-of-English translation, we also show that we can suppress off-target
translations with large language models. We release our source code at
https://github.com/ZurichNLP/ContraDecode.
Related papers
- Language-Informed Beam Search Decoding for Multilingual Machine Translation [24.044315362087687]
Language-informed Beam Search (LiBS) is a general decoding algorithm incorporating an off-the-shelf Language Identification (LiD) model into beam search decoding to reduce off-target translations.
Results show that our proposed LiBS algorithm on average improves +1.1 BLEU and +0.9 BLEU on WMT and OPUS datasets, and reduces off-target rates from 22.9% to 7.7% and 65.8% to 25.3% respectively.
arXiv Detail & Related papers (2024-08-11T09:57:46Z) - Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model [28.288949710191158]
Large language models (LLMs) have showcased impressive multilingual machine translation ability.
Unlike encoder-decoder style models, decoder-only LLMs lack an explicit alignment between source and target contexts.
We propose to encourage LLMs to pay more attention to the source context from both source and target perspectives.
arXiv Detail & Related papers (2024-06-11T07:49:04Z) - Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation [79.96416609433724]
Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
arXiv Detail & Related papers (2023-09-28T17:02:36Z) - On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation [104.85258654917297]
We find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance.
We propose Language Aware Vocabulary Sharing (LAVS) to construct the multilingual vocabulary.
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
arXiv Detail & Related papers (2023-05-18T12:43:31Z) - Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT)
We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z) - Hallucinations in Large Multilingual Translation Models [70.10455226752015]
Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages.
When deployed in the wild, these models may generate hallucinated translations which have the potential to severely undermine user trust and raise safety concerns.
Existing research on hallucinations has primarily focused on small bilingual models trained on high-resource languages.
arXiv Detail & Related papers (2023-03-28T16:17:59Z) - Improving Zero-Shot Translation by Disentangling Positional Information [24.02434897109097]
We show that a main factor causing the language-specific representations is the positional correspondence to input tokens.
We gain up to 18.5 BLEU points on zero-shot translation while retaining quality on supervised directions.
arXiv Detail & Related papers (2020-12-30T12:20:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.