Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
- URL: http://arxiv.org/abs/2310.05338v2
- Date: Tue, 13 Aug 2024 05:48:31 GMT
- Title: Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
- Authors: Holy Lovenia, Wenliang Dai, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung,
- Abstract summary: NOPE (Negative Object Presence Evaluation) is a novel benchmark designed to assess object hallucination in vision-language (VL) models.
We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions.
- Score: 67.8024390595066
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects. However, the absence of a general measurement for evaluating object hallucination in VL models has hindered our understanding and ability to mitigate this issue. In this work, we present NOPE (Negative Object Presence Evaluation), a novel benchmark designed to assess object hallucination in VL models through visual question answering (VQA). We propose a cost-effective and scalable approach utilizing large language models to generate 29.5k synthetic negative pronoun (NegP) data of high quality for NOPE. We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions, where the ground truth answers are denoted as NegP (e.g., "none"). Additionally, we evaluate their standard performance on visual questions on 9 other VQA datasets. Through our experiments, we demonstrate that no VL model is immune to the vulnerability of object hallucination, as all models achieve accuracy below 10\% on NegP. Furthermore, we uncover that lexically diverse visual questions, question types with large scopes, and scene-relevant objects capitalize the risk of object hallucination in VL models.
Related papers
- Evaluating Hallucination in Large Vision-Language Models based on Context-Aware Object Similarities [5.602853217226167]
We present Context-Aware Object Similarities (CAOS), a novel approach for evaluating object hallucination in large vision-language models (LVLMs)
CAOS integrates object statistics with semantic relationships between objects in captions and ground-truth data.
To address this, we further employ language model-based object recognition to detect potentially out-of-domain hallucinated objects.
arXiv Detail & Related papers (2025-01-25T03:03:18Z) - HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in performing complex multimodal tasks.
We propose HALLUCINOGEN, a novel visual question answering (VQA) object hallucination attack benchmark.
We extend our benchmark to high-stakes medical applications and introduce MED-HALLUCINOGEN, hallucination attacks tailored to the biomedical domain.
arXiv Detail & Related papers (2024-12-29T23:56:01Z) - H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models [0.0]
We propose H-POPE, a coarse-to-fine-grained benchmark that assesses hallucinations in object existence and attributes.
Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes.
arXiv Detail & Related papers (2024-11-06T17:55:37Z) - Multi-Object Hallucination in Vision-Language Models [28.135215173793785]
Large vision language models (LVLMs) often suffer from object hallucination.
Hallucinatory behaviors are influenced by data-specific factors, salience and frequency, and intrinsic model behaviors.
arXiv Detail & Related papers (2024-07-08T17:59:57Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models [57.42800112251644]
We focus on a specific type of hallucination-number hallucination, referring to models incorrectly identifying the number of certain objects in pictures.
We devise a training approach aimed at improving consistency to reduce number hallucinations, which leads to an 8% enhancement in performance over direct finetuning methods.
arXiv Detail & Related papers (2024-03-03T02:31:11Z) - Analyzing and Mitigating Object Hallucination in Large Vision-Language Models [110.12460299261531]
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.
LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images.
We propose a powerful algorithm, LVLM Hallucination Revisor (LURE), to rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions.
arXiv Detail & Related papers (2023-10-01T18:10:53Z) - Plausible May Not Be Faithful: Probing Object Hallucination in
Vision-Language Pre-training [66.0036211069513]
Large-scale vision-language pre-trained models are prone to hallucinate non-existent visual objects when generating text.
We show that models achieving better scores on standard metrics could hallucinate objects more frequently.
Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
arXiv Detail & Related papers (2022-10-14T10:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.