Related papers: Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

URL: http://arxiv.org/abs/2402.19103v1
Date: Thu, 29 Feb 2024 12:35:45 GMT
Title: Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
Authors: Hongbang Yuan, Pengfei Cao, Zhuoran Jin, Yubo Chen, Daojian Zeng, Kang Liu, Jun Zhao
Abstract summary: Large Language Models (LLMs) generate hallucinated text when confronted with false premise questions. We propose textbfFAITH (textbfFalse premise textbfAttention head constratextbfIining for mitextbfTigating textbfHallucinations), a novel and effective method to mitigate false premise hallucinations.
Score: 20.025123325871835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have shown impressive capabilities but still suffer from the issue of hallucinations. A significant type of this issue is the false premise hallucination, which we define as the phenomenon when LLMs generate hallucinated text when confronted with false premise questions. In this paper, we perform a comprehensive analysis of the false premise hallucination and elucidate its internal working mechanism: a small subset of attention heads (which we designate as false premise heads) disturb the knowledge extraction process, leading to the occurrence of false premise hallucination. Based on our analysis, we propose \textbf{FAITH} (\textbf{F}alse premise \textbf{A}ttention head constra\textbf{I}ining for mi\textbf{T}igating \textbf{H}allucinations), a novel and effective method to mitigate false premise hallucinations. It constrains the false premise attention heads during the model inference process. Impressively, extensive experiments demonstrate that constraining only approximately $1\%$ of the attention heads in the model yields a notable increase of nearly $20\%$ of model performance.

Related papers

Mitigating Behavioral Hallucination in Multimodal Large Language Models for Sequential Images [6.48620624181578]
We introduce SHE (Sequence Hallucination Eradication), a lightweight framework that detects hallucinations and mitigates them.<n>We also propose a new metric (BEACH) to quantify behavioral hallucination severity.
arXiv Detail & Related papers (2025-06-08T15:08:52Z)
HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination" This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations. Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones. We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z)
Valuable Hallucinations: Realizable Non-realistic Propositions [2.451326684641447]
This paper introduces the first formal definition of valuable hallucinations in large language models (LLMs) We focus on the potential value that certain types of hallucinations can offer in specific contexts. We present experiments using the Qwen2.5 model and HalluQA dataset, employing ReAct prompting to control and optimize hallucinations.
arXiv Detail & Related papers (2025-02-16T12:59:11Z)
Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models [51.50892380172863]
We show that most state-of-the-art MLLMs suffer from severe verb hallucination. We propose a novel rich verb knowledge-based tuning method to mitigate verb hallucination.
arXiv Detail & Related papers (2024-12-06T10:53:47Z)
Who Brings the Frisbee: Probing Hidden Hallucination Factors in Large Vision-Language Model via Causality Analysis [14.033320167387194]
A major challenge in their real-world application is hallucination, where LVLMs generate non-existent visual elements, eroding user trust. We hypothesize that hidden factors, such as objects, contexts, and semantic foreground-background structures, induce hallucination. By analyzing the causality between images, text prompts, and network saliency, we systematically explore interventions to block these factors.
arXiv Detail & Related papers (2024-12-04T01:23:57Z)
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models [22.42712853647949]
We present an in-depth investigation into the object hallucination problem specifically within the CLIP model. We unveil that even in isolation, the CLIP model is prone to object hallucinations, suggesting that the hallucination problem is not solely due to the interaction between vision and language modalities. We show the the enhanced model can be employed as a visual encoder, effectively alleviating the object hallucination issue in LVLMs.
arXiv Detail & Related papers (2024-10-04T06:24:49Z)
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? [53.89380284760555]
Large vision-language models (LVLMs) produce captions that mention concepts that cannot be found in the image. These hallucinations erode the trustworthiness of LVLMs and are arguably among the main obstacles to their ubiquitous adoption. Recent work suggests that addition of grounding objectives -- those that explicitly align image regions or objects to text spans -- reduces the amount of LVLM hallucination.
arXiv Detail & Related papers (2024-06-20T16:56:11Z)
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [48.065569871444275]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback. We generate a small-size hallucination annotation dataset by proprietary models. Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z)
On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination. We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics. Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z)
Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations [42.46721214112836]
State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. We create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations.
arXiv Detail & Related papers (2024-03-27T00:23:03Z)
The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models. Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them. This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z)
Alleviating Hallucinations of Large Language Models through Induced Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information. We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z)
On Early Detection of Hallucinations in Factual Question Answering [4.76359068115052]
hallucinations remain a major impediment towards gaining user trust. In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations. Our results show that the distributions of these artifacts tend to differ between hallucinated and non-hallucinated generations.
arXiv Detail & Related papers (2023-12-19T14:35:04Z)
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples [17.012156573134067]
We show that nonsensical prompts composed of random tokens can elicit large language models to respond with hallucinations. We formalize an automatic hallucination triggering method as the textithallucination attack in an adversarial way. Our code is released on GitHub.
arXiv Detail & Related papers (2023-10-02T17:01:56Z)
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models [146.87696738011712]
Large language models (LLMs) are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge. To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval)
arXiv Detail & Related papers (2023-05-19T15:36:27Z)
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization [36.052622624166894]
State-of-the-art abstractive summarization systems often generate emphhallucinations; i.e., content that is not directly inferable from the source text. We propose a novel detection approach that separates factual from non-factual hallucinations of entities.
arXiv Detail & Related papers (2021-08-30T15:40:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.