H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models
- URL: http://arxiv.org/abs/2411.04077v1
- Date: Wed, 06 Nov 2024 17:55:37 GMT
- Title: H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models
- Authors: Nhi Pham, Michael Schott,
- Abstract summary: We propose H-POPE, a coarse-to-fine-grained benchmark that assesses hallucinations in object existence and attributes.
Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes.
- Score: 0.0
- License:
- Abstract: By leveraging both texts and images, large vision language models (LVLMs) have shown significant progress in various multi-modal tasks. Nevertheless, these models often suffer from hallucinations, e.g., they exhibit inconsistencies between the visual input and the textual output. To address this, we propose H-POPE, a coarse-to-fine-grained benchmark that systematically assesses hallucination in object existence and attributes. Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes. We further investigate whether these models rely on visual input to formulate the output texts.
Related papers
- Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding [14.701135083174918]
Large Vision-Language Models (LVLMs) generate detailed and coherent responses from visual inputs.
They are prone to generate hallucinations due to an over-reliance on language priors.
We propose a novel method, Summary-Guided Decoding (SGD)
arXiv Detail & Related papers (2024-10-17T08:24:27Z) - From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models [15.401221354325672]
Hallucinations in large vision models (LVLMs) are a significant challenge, i.e., generating objects that are not presented in the visual input.
Recent studies often attribute hallucinations to a lack of understanding of visual input, yet ignore a more fundamental issue: the model's inability to extract or decouple visual features.
In this paper, we revisit the hallucinations in LVLMs from an architectural perspective, investigating whether the primary cause lies in the visual encoder (feature extraction) or the modal alignment module (feature decoupling)
arXiv Detail & Related papers (2024-10-09T11:46:32Z) - VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models [59.05674402770661]
This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
VideoHallucer categorizes hallucinations into two main types: intrinsic and extrinsic, offering further subcategories for detailed analysis.
arXiv Detail & Related papers (2024-06-24T06:21:59Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models [24.11077502209129]
Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text.
However, these models are prone to hallucinations'' -- outputs that do not align with factual reality or the input context.
This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and compare the tendency of each model to produce hallucinations.
arXiv Detail & Related papers (2024-04-08T23:16:22Z) - Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models [57.42800112251644]
We focus on a specific type of hallucination-number hallucination, referring to models incorrectly identifying the number of certain objects in pictures.
We devise a training approach aimed at improving consistency to reduce number hallucinations, which leads to an 8% enhancement in performance over direct finetuning methods.
arXiv Detail & Related papers (2024-03-03T02:31:11Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - Plausible May Not Be Faithful: Probing Object Hallucination in
Vision-Language Pre-training [66.0036211069513]
Large-scale vision-language pre-trained models are prone to hallucinate non-existent visual objects when generating text.
We show that models achieving better scores on standard metrics could hallucinate objects more frequently.
Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
arXiv Detail & Related papers (2022-10-14T10:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.