CIEM: Contrastive Instruction Evaluation Method for Better Instruction
Tuning
- URL: http://arxiv.org/abs/2309.02301v2
- Date: Fri, 24 Nov 2023 07:07:03 GMT
- Title: CIEM: Contrastive Instruction Evaluation Method for Better Instruction
Tuning
- Authors: Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun
- Abstract summary: Vision-Language Models (VLMs) may generate incorrect perception information when doing downstream applications, for example, captioning a non-existent entity.
To address the hallucination phenomenon, we introduce a Contrastive Instruction Evaluation Method (CIEM) and Contrastive Instruction Tuning (CIT)
We pinpoint the hallucination issues commonly present in existing VLMs, the disability of the current instruction-tuning dataset to handle the hallucination phenomenon and the superiority of CIT-tuned VLMs over both CIEM and public datasets.
- Score: 8.217445461627797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, the research on Large Vision-Language Models (LVLMs) has been
significantly promoted thanks to the success of Large Language Models (LLM).
Nevertheless, these Vision-Language Models (VLMs) are suffering from the
drawback of hallucination -- due to insufficient understanding of vision and
language modalities, VLMs may generate incorrect perception information when
doing downstream applications, for example, captioning a non-existent entity.
To address the hallucination phenomenon, on the one hand, we introduce a
Contrastive Instruction Evaluation Method (CIEM), which is an automatic
pipeline that leverages an annotated image-text dataset coupled with an LLM to
generate factual/contrastive question-answer pairs for the evaluation of the
hallucination of VLMs. On the other hand, based on CIEM, we further propose a
new instruction tuning method called CIT (the abbreviation of Contrastive
Instruction Tuning) to alleviate the hallucination of VLMs by automatically
producing high-quality factual/contrastive question-answer pairs and
corresponding justifications for model tuning. Through extensive experiments on
CIEM and CIT, we pinpoint the hallucination issues commonly present in existing
VLMs, the disability of the current instruction-tuning dataset to handle the
hallucination phenomenon and the superiority of CIT-tuned VLMs over both CIEM
and public datasets.
Related papers
- Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering [66.5524727179286]
NOVA is a framework designed to identify high-quality data that aligns well with the learned knowledge to reduce hallucinations.
It includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data.
To ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity.
arXiv Detail & Related papers (2025-02-11T08:05:56Z) - Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models [66.71616369573715]
Large Vision-Language Models (LVLMs) are prone to generating hallucinatory text responses that do not align with the given visual input.
We introduce self-correcting Decoding with Generative Feedback (DeGF), a novel training-free algorithm that incorporates feedback from text-to-image generative models into the decoding process.
arXiv Detail & Related papers (2025-02-10T03:43:55Z) - Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding [5.424048651554831]
Internal Fact-based Contrastive Decoding (IFCD) is designed to mitigate and suppress hallucinations during the inference process of Large Visual Language Models (LVLMs)
IFCD calibrates the LVLMs' output and effectively removes the hallucinatory logits from the final predictions.
Experimental results validate that IFCD significantly alleviates both object-level and attribute-level hallucinations while achieving an average 9% accuracy improvement on POPE and 8% accuracy improvement on MME object hallucinations subset compared with direct decoding, respectively.
arXiv Detail & Related papers (2025-02-03T05:08:35Z) - Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding [66.06337890279839]
Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding for downstream multi-modal tasks.
LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content.
We propose an Inter-Modality Correlation Decoding (IMCCD) method to mitigate hallucinations in LVLMs in a training-free manner.
arXiv Detail & Related papers (2025-01-03T17:56:28Z) - Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning [16.883679810267342]
Iterative Model-level Contrastive Learning (Iter-AHMCL) to address hallucination.
This paper introduces a novel approach called Iterative Model-level Contrastive Learning (Iter-AHMCL) to address hallucination.
arXiv Detail & Related papers (2024-10-16T00:15:40Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - IBD: Alleviating Hallucinations in Large Vision-Language Models via
Image-Biased Decoding [37.16880672402059]
Over-reliance on linguistic priors has been identified as a key factor leading to hallucinations.
We propose to alleviate this problem by introducing a novel image-biased decoding technique.
Our method derives the next-token probability distribution by contrasting predictions from a conventional LVLM with those of an image-biased LVLM.
arXiv Detail & Related papers (2024-02-28T16:57:22Z) - Mitigating Object Hallucination in Large Vision-Language Models via
Classifier-Free Guidance [56.04768229686853]
Large Vision-Language Models (LVLMs) tend to hallucinate non-existing objects in the images.
We introduce a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE (MARINE)
MARINE is both training-free and API-free, and can effectively and efficiently reduce object hallucinations during the generation process.
arXiv Detail & Related papers (2024-02-13T18:59:05Z) - Siren's Song in the AI Ocean: A Survey on Hallucination in Large
Language Models [116.01843550398183]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks.
LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.