Related papers: Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

URL: http://arxiv.org/abs/2109.09784v1
Date: Mon, 30 Aug 2021 15:40:52 GMT
Title: Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
Authors: Meng Cao, Yue Dong and Jackie Chi Kit Cheung
Abstract summary: State-of-the-art abstractive summarization systems often generate emphhallucinations; i.e., content that is not directly inferable from the source text. We propose a novel detection approach that separates factual from non-factual hallucinations of entities.
Score: 36.052622624166894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art abstractive summarization systems often generate \emph{hallucinations}; i.e., content that is not directly inferable from the source text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity's prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical results suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgments on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer's pre-training or fine-tuning step.

Related papers

HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination" This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations. Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones. We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z)
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning [5.130890556960832]
This work delves into the intricacies of hallucinatory phenomena exhibited by widely used image captioners, unraveling interesting patterns. The deterministic and efficient nature of the employed conceptual counterfactuals backbone is able to suggest semantically minimal edits. Our proposed hallucination detection framework is highly interpretable, by providing semantically meaningful edits apart from standalone numbers.
arXiv Detail & Related papers (2025-03-01T10:28:19Z)
Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs [45.13670875211498]
Large Language Models (LLMs) often generate outputs that lack grounding in real-world facts, a phenomenon known as hallucinations. We show that models can hallucinate with high certainty even when they have the correct knowledge.
arXiv Detail & Related papers (2025-02-18T15:46:31Z)
Can Your Uncertainty Scores Detect Hallucinated Entity? [14.432545893757677]
We propose a new data set, HalluEntity, which annotates hallucination at the entity level. Based on the dataset, we evaluate uncertainty-based hallucination detection approaches across 17 modern LLMs. Our experimental results show that uncertainty estimation approaches focusing on individual token probabilities tend to over-predict hallucinations.
arXiv Detail & Related papers (2025-02-17T16:01:41Z)
Valuable Hallucinations: Realizable Non-realistic Propositions [2.451326684641447]
This paper introduces the first formal definition of valuable hallucinations in large language models (LLMs) We focus on the potential value that certain types of hallucinations can offer in specific contexts. We present experiments using the Qwen2.5 model and HalluQA dataset, employing ReAct prompting to control and optimize hallucinations.
arXiv Detail & Related papers (2025-02-16T12:59:11Z)
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models [22.996176483599868]
We design a unified framework to measure object and relation hallucination in Large Vision-Language Models (LVLMs) simultaneously. Based on our framework, we introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark.
arXiv Detail & Related papers (2024-10-30T15:25:06Z)
Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models [13.48296910438554]
Hallucination issues persistently plagued current multimodal large language models (MLLMs) We introduce Reefknot, a benchmark specifically targeting relation hallucinations, consisting of over 20,000 samples derived from real-world scenarios. Our comparative evaluation across three distinct tasks revealed a substantial shortcoming in the capabilities of current MLLMs to mitigate relation hallucinations.
arXiv Detail & Related papers (2024-08-18T10:07:02Z)
Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models [65.32990889402927]
We coin this phenomenon as knowledge overshadowing'' We show that the hallucination rate grows with both the imbalance ratio and the length of dominant condition description. We propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced.
arXiv Detail & Related papers (2024-07-10T20:37:42Z)
On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination. We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics. Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z)
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation [36.31646727970656]
Large language models (LLMs) frequently hallucinate and produce factual errors. correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones. We propose an entropy-based metric to quantify the sharpness'' among the in-context hidden states and incorporate it into the decoding process.
arXiv Detail & Related papers (2024-03-03T15:53:41Z)
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models [35.45859414670449]
We introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations. The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations.
arXiv Detail & Related papers (2024-02-24T05:14:52Z)
Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations. We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms. We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z)
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored. We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm. Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z)
Correction with Backtracking Reduces Hallucination in Summarization [29.093092115901694]
Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating. We introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization.
arXiv Detail & Related papers (2023-10-24T20:48:11Z)
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search [54.286450484332505]
We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source. We present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
arXiv Detail & Related papers (2022-03-16T07:13:52Z)
On Hallucination and Predictive Uncertainty in Conditional Language Generation [76.18783678114325]
Higher predictive uncertainty corresponds to a higher chance of hallucination. Epistemic uncertainty is more indicative of hallucination than aleatoric or total uncertainties. It helps to achieve better results of trading performance in standard metric for less hallucination with the proposed beam search variant.
arXiv Detail & Related papers (2021-03-28T00:32:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.