Understanding and Detecting Hallucinations in Neural Machine Translation
via Model Introspection
- URL: http://arxiv.org/abs/2301.07779v1
- Date: Wed, 18 Jan 2023 20:43:13 GMT
- Title: Understanding and Detecting Hallucinations in Neural Machine Translation
via Model Introspection
- Authors: Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale,
Marine Carpuat
- Abstract summary: We first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations.
We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector.
- Score: 28.445196622710164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural sequence generation models are known to "hallucinate", by producing
outputs that are unrelated to the source text. These hallucinations are
potentially harmful, yet it remains unclear in what conditions they arise and
how to mitigate their impact. In this work, we first identify internal model
symptoms of hallucinations by analyzing the relative token contributions to the
generation in contrastive hallucinated vs. non-hallucinated outputs generated
via source perturbations. We then show that these symptoms are reliable
indicators of natural hallucinations, by using them to design a lightweight
hallucination detector which outperforms both model-free baselines and strong
classifiers based on quality estimation or large pre-trained models on manually
annotated English-Chinese and German-English translation test beds.
Related papers
- Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models [65.32990889402927]
We coin this phenomenon as knowledge overshadowing''
We show that the hallucination rate grows with both the imbalance ratio and the length of dominant condition description.
We propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced.
arXiv Detail & Related papers (2024-07-10T20:37:42Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations [42.46721214112836]
State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge.
We create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations.
arXiv Detail & Related papers (2024-03-27T00:23:03Z) - Hallucinations in Neural Automatic Speech Recognition: Identifying
Errors and Hallucinatory Models [11.492702369437785]
Hallucinations are semantically unrelated to the source utterance, yet still fluent and coherent.
We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models.
We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency.
arXiv Detail & Related papers (2024-01-03T06:56:56Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - On Early Detection of Hallucinations in Factual Question Answering [4.76359068115052]
hallucinations remain a major impediment towards gaining user trust.
In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations.
Our results show that the distributions of these artifacts tend to differ between hallucinated and non-hallucinated generations.
arXiv Detail & Related papers (2023-12-19T14:35:04Z) - Reducing Hallucinations in Neural Machine Translation with Feature
Attribution [54.46113444757899]
We present a case study focusing on model understanding and regularisation to reduce hallucinations in NMT.
We first use feature attribution methods to study the behaviour of an NMT model that produces hallucinations.
We then leverage these methods to propose a novel loss function that substantially helps reduce hallucinations and does not require retraining the model from scratch.
arXiv Detail & Related papers (2022-11-17T20:33:56Z) - Probing Causes of Hallucinations in Neural Machine Translations [51.418245676894465]
We propose to use probing methods to investigate the causes of hallucinations from the perspective of model architecture.
We find that hallucination is often accompanied by the deficient encoder, especially embeddings, and vulnerable cross-attentions.
arXiv Detail & Related papers (2022-06-25T01:57:22Z) - On Hallucination and Predictive Uncertainty in Conditional Language
Generation [76.18783678114325]
Higher predictive uncertainty corresponds to a higher chance of hallucination.
Epistemic uncertainty is more indicative of hallucination than aleatoric or total uncertainties.
It helps to achieve better results of trading performance in standard metric for less hallucination with the proposed beam search variant.
arXiv Detail & Related papers (2021-03-28T00:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.