Related papers: Data-augmented phrase-level alignment for mitigating object hallucination

Data-augmented phrase-level alignment for mitigating object hallucination

URL: http://arxiv.org/abs/2405.18654v2
Date: Tue, 08 Oct 2024 21:01:08 GMT
Title: Data-augmented phrase-level alignment for mitigating object hallucination
Authors: Pritam Sarkar, Sayna Ebrahimi, Ali Etemad, Ahmad Beirami, Sercan Ö. Arık, Tomas Pfister,
Abstract summary: Multimodal Large Language Models (MLLMs) often generate factually inaccurate information, referred to as hallucination. We introduce Data-augmented Phrase-level Alignment (DPA), a novel loss which can be applied to instruction-tuned off-the-shelf MLLMs to mitigate hallucinations.
Score: 52.43197107069751
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite their significant advancements, Multimodal Large Language Models (MLLMs) often generate factually inaccurate information, referred to as hallucination. In this work, we address object hallucinations in MLLMs, where information is generated about an object not present in the input image. We introduce Data-augmented Phrase-level Alignment (DPA), a novel loss which can be applied to instruction-tuned off-the-shelf MLLMs to mitigate hallucinations, while preserving their general vision-language capabilities. To fine-tune MLLMs with DPA, we first generate a set of `hallucinated' and `correct' response pairs through generative data augmentation by selectively altering the ground-truth information of the correct responses at a phrase level. The DPA loss is then used to train MLLMs to reduce the likelihood of hallucinated phrases compared to the correct ones. Our thorough evaluation on various benchmarks confirms the effectiveness of DPA in mitigating hallucination while retaining the out-of-the-box performance of the MLLMs on general tasks. For instance, MLLMs finetuned with DPA, which we refer to as Hallucination Attenuated Language and Vision Assistant (HALVA), improve F1 by up to 13.4% on hallucination visual question-answering and reduce the hallucination rate by up to 4.2% on image description tasks.

Related papers

Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization [55.543583937522804]
Multimodal Large Language Models (MLLMs) emerge as a unified interface to address a multitude of tasks.<n>Despite showcasing state-of-the-art results in many benchmarks, a long-standing issue is the tendency of MLLMs to hallucinate.<n>In this paper, we address the problem of hallucinations as an alignment problem, seeking to steer the MLLM so that it prefers generating content without hallucinations.
arXiv Detail & Related papers (2025-08-27T18:02:04Z)
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs [54.50483041708911]
Hallu-PI is the first benchmark designed to evaluate hallucination in MLLMs within Perturbed Inputs. Hallu-PI consists of seven perturbed scenarios, containing 1,260 perturbed images from 11 object types. Our research reveals a severe bias in MLLMs' ability to handle different types of hallucinations.
arXiv Detail & Related papers (2024-08-02T16:07:15Z)
Mitigating Multilingual Hallucination in Large Vision-Language Models [35.75851356840673]
We propose a two-stage Multilingual Hallucination Removal (MHR) framework for Large Vision-Language Models (LVLMs) Instead of relying on the intricate manual annotations of multilingual resources, we propose a novel cross-lingual alignment method. Our framework delivers an average increase of 19.0% in accuracy across 13 different languages.
arXiv Detail & Related papers (2024-08-01T13:34:35Z)
Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations. We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms. We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z)
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model [53.65682783591723]
Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks. However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information. In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning.
arXiv Detail & Related papers (2023-12-12T04:05:15Z)
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored. We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm. Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z)
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models [110.12460299261531]
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. We propose a powerful algorithm, LVLM Hallucination Revisor (LURE), to rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions.
arXiv Detail & Related papers (2023-10-01T18:10:53Z)
Evaluation and Analysis of Hallucination in Large Vision-Language Models [49.19829480199372]
Large Vision-Language Models (LVLMs) have recently achieved remarkable success. LVLMs are still plagued by the hallucination problem. Hallucination refers to the information of LVLMs' responses that does not exist in the visual input.
arXiv Detail & Related papers (2023-08-29T08:51:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.