Negative Object Presence Evaluation (NOPE) to Measure Object
Hallucination in Vision-Language Models
- URL: http://arxiv.org/abs/2310.05338v1
- Date: Mon, 9 Oct 2023 01:52:27 GMT
- Title: Negative Object Presence Evaluation (NOPE) to Measure Object
Hallucination in Vision-Language Models
- Authors: Holy Lovenia, Wenliang Dai, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung
- Abstract summary: NOPE (Negative Object Presence Evaluation) is a novel benchmark designed to assess object hallucination in vision-language (VL) models.
We extensively investigate the performance of 10 state-of-the-art VL models in discerning the non-existence of objects in visual questions.
- Score: 72.74157242401981
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Object hallucination poses a significant challenge in vision-language (VL)
models, often leading to the generation of nonsensical or unfaithful responses
with non-existent objects. However, the absence of a general measurement for
evaluating object hallucination in VL models has hindered our understanding and
ability to mitigate this issue. In this work, we present NOPE (Negative Object
Presence Evaluation), a novel benchmark designed to assess object hallucination
in VL models through visual question answering (VQA). We propose a
cost-effective and scalable approach utilizing large language models to
generate 29.5k synthetic negative pronoun (NegP) data of high quality for NOPE.
We extensively investigate the performance of 10 state-of-the-art VL models in
discerning the non-existence of objects in visual questions, where the ground
truth answers are denoted as NegP (e.g., "none"). Additionally, we evaluate
their standard performance on visual questions on 9 other VQA datasets. Through
our experiments, we demonstrate that no VL model is immune to the vulnerability
of object hallucination, as all models achieve accuracy below 10\% on NegP.
Furthermore, we uncover that lexically diverse visual questions, question types
with large scopes, and scene-relevant objects capitalize the risk of object
hallucination in VL models.
Related papers
- Multi-Object Hallucination in Vision-Language Models [28.135215173793785]
Large vision language models (LVLMs) often suffer from object hallucination.
Hallucinatory behaviors are influenced by data-specific factors, salience and frequency, and model intrinsic behaviors.
arXiv Detail & Related papers (2024-07-08T17:59:57Z) - AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models [91.78328878860003]
Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects.
We develop the first automatic benchmark generation approach, AUTOHALLUSION, that harnesses a few principal strategies to create diverse examples.
It generates image-based questions whose ground-truth answers contradict the language module's prior.
A model has to overcome contextual biases and distractions to reach correct answers, while incorrect or inconsistent answers indicate hallucinations.
arXiv Detail & Related papers (2024-06-16T11:44:43Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - Quantity Matters: Towards Assessing and Mitigating Number Hallucination in Large Vision-Language Models [57.42800112251644]
We focus on a specific type of hallucination-number hallucination, referring to models incorrectly identifying the number of certain objects in pictures.
We devise a training approach aimed at improving consistency to reduce number hallucinations, which leads to an 8% enhancement in performance over direct finetuning methods.
arXiv Detail & Related papers (2024-03-03T02:31:11Z) - Analyzing and Mitigating Object Hallucination in Large Vision-Language Models [110.12460299261531]
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.
LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images.
We propose a powerful algorithm, LVLM Hallucination Revisor (LURE), to rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions.
arXiv Detail & Related papers (2023-10-01T18:10:53Z) - Plausible May Not Be Faithful: Probing Object Hallucination in
Vision-Language Pre-training [66.0036211069513]
Large-scale vision-language pre-trained models are prone to hallucinate non-existent visual objects when generating text.
We show that models achieving better scores on standard metrics could hallucinate objects more frequently.
Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
arXiv Detail & Related papers (2022-10-14T10:27:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.