Related papers: Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation

Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation

URL: http://arxiv.org/abs/2408.00555v1
Date: Thu, 1 Aug 2024 13:38:58 GMT
Title: Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation
Authors: Xiaoye Qu, Qiyuan Chen, Wei Wei, Jishuo Sun, Jianfeng Dong,
Abstract summary: We introduce a novel framework, the Active Retrieval-Augmented large vision-language model (ARA), specifically designed to address hallucinations. Our empirical observations suggest that by utilizing fitting retrieval mechanisms and timing the retrieval judiciously, we can effectively mitigate the hallucination problem.
Score: 21.31915988262898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the remarkable ability of large vision-language models (LVLMs) in image comprehension, these models frequently generate plausible yet factually incorrect responses, a phenomenon known as hallucination.Recently, in large language models (LLMs), augmenting LLMs by retrieving information from external knowledge resources has been proven as a promising solution to mitigate hallucinations.However, the retrieval augmentation in LVLM significantly lags behind the widespread applications of LVLM. Moreover, when transferred to augmenting LVLMs, sometimes the hallucination degree of the model is even exacerbated.Motivated by the research gap and counter-intuitive phenomenon, we introduce a novel framework, the Active Retrieval-Augmented large vision-language model (ARA), specifically designed to address hallucinations by incorporating three critical dimensions: (i) dissecting the retrieval targets based on the inherent hierarchical structures of images. (ii) pinpointing the most effective retrieval methods and filtering out the reliable retrieval results. (iii) timing the retrieval process to coincide with episodes of low certainty, while circumventing unnecessary retrieval during periods of high certainty. To assess the capability of our proposed ARA model in reducing hallucination, we employ three widely used LVLM models (LLaVA-1.5, Qwen-VL, and mPLUG-Owl2) across four benchmarks. Our empirical observations suggest that by utilizing fitting retrieval mechanisms and timing the retrieval judiciously, we can effectively mitigate the hallucination problem. We hope that this study can provide deeper insights into how to adapt the retrieval augmentation to LVLMs for reducing hallucinations with more effective retrieval and minimal retrieval occurrences.

Related papers

IKOD: Mitigating Visual Attention Degradation in Large Vision-Language Models [20.036659182106806]
We show that Large Vision-Language Models (LVLMs) exhibit a long-term bias where hallucinations increase as the sequence length grows.<n>We propose Image attention-guided Key-value merging cOllaborative Decoding (IKOD), a collaborative decoding strategy generating more image-focused sequences.
arXiv Detail & Related papers (2025-08-05T14:05:15Z)
Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation [52.52962914918779]
hallucination issues significantly limit their credibility and application potential.<n>Existing mitigation methods rely on external tools or the comparison of multi-round inference.<n>We propose textbfSElf-textbfEvolving textbfDistillation (textbfSEED), which identifies hallucinations within the inner knowledge of LVLMs, isolates and purges them, and then distills the purified knowledge back into the model.
arXiv Detail & Related papers (2025-07-07T05:56:19Z)
Visual hallucination detection in large vision-language models via evidential conflict [24.465497252040294]
Dempster-Shafer theory (DST)-based visual hallucination detection method for LVLMs through uncertainty estimation.<n>We propose to the best of our knowledge, the first Dempster-Shafer theory (DST)-based visual hallucination detection method for LVLMs through uncertainty estimation.
arXiv Detail & Related papers (2025-06-24T11:03:10Z)
Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models - [1.2499537119440245]
Efficient Contrastive Decoding (ECD) is a simple method that leverages probabilistic hallucination detection to shift the output distribution towards contextually accurate answers at inference time. Our experiments show that ECD effectively mitigates hallucinations, outperforming state-of-the-art methods with respect to performance on LVLM benchmarks and computation time.
arXiv Detail & Related papers (2025-04-16T14:50:25Z)
Delusions of Large Language Models [62.43923767408462]
Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate.
arXiv Detail & Related papers (2025-03-09T17:59:16Z)
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx. The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z)
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs [7.920981206857122]
Large vision-language models (LVMs) extend large language models (LLMs) with visual perception capabilities. A major challenge compromising their reliability is object hallucination that LVMs may generate plausible but factually inaccurate information. We propose a novel visual adversarial perturbation (VAP) method to mitigate this hallucination issue.
arXiv Detail & Related papers (2025-01-31T14:31:00Z)
Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks. These models still suffer from hallucinations when required to implicitly recognize or infer diverse visual entities from images. We propose a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks.
arXiv Detail & Related papers (2024-12-29T23:56:01Z)
A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery [21.2023350773338]
We show that hallucinations exist when using large language models (LLMs) in causal discovery. We propose using Retrieval Augmented Generation (RAG) to reduce hallucinations when quality data is available.
arXiv Detail & Related papers (2024-11-16T03:06:39Z)
A Survey of Hallucination in Large Visual Language Models [48.794850395309076]
The existence of hallucinations has limited the potential and practical effectiveness of LVLM in various fields. The structure of LVLMs and main causes of hallucination generation are introduced. The available hallucination evaluation benchmarks for LVLMs are presented.
arXiv Detail & Related papers (2024-10-20T10:58:58Z)
Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD) [13.430637580980164]
Large Vision-Language Models (LVLMs) are an extension of Large Language Models (LLMs) that facilitate processing both image and text inputs, expanding AI capabilities. Our study introduces a Language Contrastive Decoding (LCD) algorithm that adjusts LVLM outputs based on Large Language Models distribution confidence levels. Our method effectively improves LVLMs without needing complex post-processing or retraining, and is easily applicable to different models.
arXiv Detail & Related papers (2024-08-06T08:10:34Z)
Mitigating Entity-Level Hallucination in Large Language Models [11.872916697604278]
This paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD) as a novel method to detect and mitigate hallucinations in Large Language Models (LLMs) Experiment results show that DRAD demonstrates superior performance in both detecting and mitigating hallucinations in LLMs.
arXiv Detail & Related papers (2024-07-12T16:47:34Z)
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [48.065569871444275]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback. We generate a small-size hallucination annotation dataset by proprietary models. Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z)
Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance [56.04768229686853]
Large Vision-Language Models (LVLMs) tend to hallucinate non-existing objects in the images. We introduce a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE (MARINE) MARINE is both training-free and API-free, and can effectively and efficiently reduce object hallucinations during the generation process.
arXiv Detail & Related papers (2024-02-13T18:59:05Z)
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations. We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z)
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models [116.01843550398183]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks. LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z)
Evaluating Object Hallucination in Large Vision-Language Models [122.40337582958453]
This work presents the first systematic study on object hallucination of large vision-language models (LVLMs) We find that LVLMs tend to generate objects that are inconsistent with the target images in the descriptions. We propose a polling-based query method called POPE to evaluate the object hallucination.
arXiv Detail & Related papers (2023-05-17T16:34:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.