Related papers: Visual Hallucination: Definition, Quantification, and Prescriptive Remediations

Visual Hallucination: Definition, Quantification, and Prescriptive Remediations

URL: http://arxiv.org/abs/2403.17306v2
Date: Sun, 31 Mar 2024 03:52:14 GMT
Title: Visual Hallucination: Definition, Quantification, and Prescriptive Remediations
Authors: Anku Rani, Vipula Rawte, Harshad Sharma, Neeraj Anand, Krishnav Rajbangshi, Amit Sheth, Amitava Das,
Abstract summary: hallucination presents perhaps the most significant impediment to the advancement of AI. We offer a fine-grained profiling of hallucination based on two tasks: i) image captioning, and ii) Visual Question Answering (VQA) We curate a dataset comprising 2,000 samples generated using eight tasks of captioning and VQA along with human annotations for the discourse.
Score: 5.980832131162941
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The troubling rise of hallucination presents perhaps the most significant impediment to the advancement of responsible AI. In recent times, considerable research has focused on detecting and mitigating hallucination in Large Language Models (LLMs). However, it's worth noting that hallucination is also quite prevalent in Vision-Language models (VLMs). In this paper, we offer a fine-grained discourse on profiling VLM hallucination based on two tasks: i) image captioning, and ii) Visual Question Answering (VQA). We delineate eight fine-grained orientations of visual hallucination: i) Contextual Guessing, ii) Identity Incongruity, iii) Geographical Erratum, iv) Visual Illusion, v) Gender Anomaly, vi) VLM as Classifier, vii) Wrong Reading, and viii) Numeric Discrepancy. We curate Visual HallucInation eLiciTation (VHILT), a publicly available dataset comprising 2,000 samples generated using eight VLMs across two tasks of captioning and VQA along with human annotations for the categories as mentioned earlier.

Related papers

A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models [30.037505914306504]
Vision-Language Models (LVLMs) demonstrate remarkable capabilities in multimodal tasks.<n>LVLMs generate inaccurate visual object-related information based on the query input, potentially leading to misinformation and concerns about safety and reliability.<n>In this paper, we analyze each component of LLaVA-like LVLMs to identify potential sources of error and their impact.
arXiv Detail & Related papers (2025-05-04T01:47:58Z)
Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks. These models still suffer from hallucinations when required to implicitly recognize or infer diverse visual entities from images. We propose a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks.
arXiv Detail & Related papers (2024-12-29T23:56:01Z)
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens [7.806633929976787]
Hallucinations in Large Vision-Language Models (LVLMs) significantly undermine their reliability. This paper addresses how LVLMs process visual information and whether this process causes hallucination. We propose a simple inference-time method that adjusts visual attention by integrating information across various heads.
arXiv Detail & Related papers (2024-11-23T03:40:05Z)
HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding [36.360171373963716]
Large Vision-Language Models (LVLMs) have shown remarkable performance on many visual-language tasks. These models still suffer from multimodal hallucination, which means the generation of objects or content that violates the images. We propose Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding (HELPD) to address this issue.
arXiv Detail & Related papers (2024-09-30T15:52:05Z)
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models [59.05674402770661]
This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language models (LVLMs) VideoHallucer categorizes hallucinations into two main types: intrinsic and extrinsic, offering further subcategories for detailed analysis.
arXiv Detail & Related papers (2024-06-24T06:21:59Z)
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? [53.89380284760555]
Large vision-language models (LVLMs) produce captions that mention concepts that cannot be found in the image. These hallucinations erode the trustworthiness of LVLMs and are arguably among the main obstacles to their ubiquitous adoption. Recent work suggests that addition of grounding objectives -- those that explicitly align image regions or objects to text spans -- reduces the amount of LVLM hallucination.
arXiv Detail & Related papers (2024-06-20T16:56:11Z)
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models [35.45859414670449]
We introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination. We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations. The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations.
arXiv Detail & Related papers (2024-02-24T05:14:52Z)
A Survey on Hallucination in Large Vision-Language Models [18.540878498840435]
Large Vision-Language Models (LVLMs) have attracted growing attention within the AI landscape for its practical implementation potential. However, hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs. We dissect LVLM-related hallucinations in an attempt to establish an overview and facilitate future mitigation.
arXiv Detail & Related papers (2024-02-01T00:33:21Z)
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model [53.65682783591723]
Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning.
arXiv Detail & Related papers (2023-12-12T04:05:15Z)
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations [10.20632187568563]
We offer a discourse on profiling hallucination based on its degree, orientation, and category. We categorize hallucination into six types: (i) acronym ambiguity, (ii) numeric nuisance, (iii) generated golem, (iv) virtual voice, (v) geographic erratum, and (vi) time wrap. We propose Hallucination Vulnerability Index (HVI) as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making.
arXiv Detail & Related papers (2023-10-08T03:31:29Z)
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models [110.12460299261531]
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. We propose a powerful algorithm, LVLM Hallucination Revisor (LURE), to rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions.
arXiv Detail & Related papers (2023-10-01T18:10:53Z)
Evaluation and Analysis of Hallucination in Large Vision-Language Models [49.19829480199372]
Large Vision-Language Models (LVLMs) have recently achieved remarkable success. LVLMs are still plagued by the hallucination problem. Hallucination refers to the information of LVLMs' responses that does not exist in the visual input.
arXiv Detail & Related papers (2023-08-29T08:51:24Z)
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training [66.0036211069513]
Large-scale vision-language pre-trained models are prone to hallucinate non-existent visual objects when generating text. We show that models achieving better scores on standard metrics could hallucinate objects more frequently. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
arXiv Detail & Related papers (2022-10-14T10:27:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.