OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
via Over-Trust Penalty and Retrospection-Allocation
- URL: http://arxiv.org/abs/2311.17911v3
- Date: Tue, 12 Mar 2024 05:59:46 GMT
- Title: OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
via Over-Trust Penalty and Retrospection-Allocation
- Authors: Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi
Wang, Dahua Lin, Weiming Zhang, Nenghai Yu
- Abstract summary: We present OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a Retrospection-Allocation strategy.
Our approach begins with an interesting observation that, most hallucinations are closely tied to the knowledge aggregation patterns in the self-attention matrix.
Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue.
- Score: 124.9008419182485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hallucination, posed as a pervasive challenge of multi-modal large language
models (MLLMs), has significantly impeded their real-world usage that demands
precise judgment. Existing methods mitigate this issue with either training
with specific designed data or inferencing with external knowledge from other
sources, incurring inevitable additional costs. In this paper, we present
OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a
Retrospection-Allocation strategy, serving as a nearly free lunch to alleviate
the hallucination issue without additional data, knowledge, or training. Our
approach begins with an interesting observation that, most hallucinations are
closely tied to the knowledge aggregation patterns manifested in the
self-attention matrix, i.e., MLLMs tend to generate new tokens by focusing on a
few summary tokens, but not all the previous tokens. Such partial over-trust
inclination results in the neglecting of image tokens and describes the image
content with hallucination. Based on the observation, OPERA introduces a
penalty term on the model logits during the beam-search decoding to mitigate
the over-trust issue, along with a rollback strategy that retrospects the
presence of summary tokens in the previously generated tokens, and re-allocate
the token selection if necessary. With extensive experiments, OPERA shows
significant hallucination-mitigating performance on different MLLMs and
metrics, proving its effectiveness and generality. Our code is available at:
https://github.com/shikiw/OPERA.
Related papers
- DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer [6.438650382682887]
We introduce DOPRA, a novel approach designed to mitigate hallucinations in large language models (MLLMs)
DOPRA employs a strategy of weighted overlay penalties and redistribution in specific layers, such as the 12th layer, during the decoding process.
Overall, DOPRA represents a significant step forward in improving the output quality of MLLMs.
arXiv Detail & Related papers (2024-07-21T11:54:49Z) - MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification [1.3654846342364308]
We introduce MetaToken, a lightweight binary classifier to detect hallucinations on the token-level at negligible cost.
Based on a statistical analysis, we reveal key factors of hallucinations in LVLMs which have been overseen in previous works.
We evaluate our method on four state-of-the-art LVLMs demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2024-05-29T15:28:42Z) - Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference [59.91176945361035]
We introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference.
Our approach is inspired by two intriguing phenomena we have observed.
Our VTW approach can cut computational overhead by over 40% across diverse multimodal tasks.
arXiv Detail & Related papers (2024-05-09T14:38:53Z) - Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding [25.489832294197797]
This paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference.
Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules.
arXiv Detail & Related papers (2024-03-27T16:04:47Z) - Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective [55.41815486466186]
Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they create content that is not present in the visual inputs.
In this paper, we explore a new angle of this issue: overly detailed training data hinders the model's ability to timely terminate generation.
We find that the model assesses the completeness of the entire sequence by comparing the generated text with the image.
arXiv Detail & Related papers (2024-02-22T13:33:13Z) - Hallucination Augmented Contrastive Learning for Multimodal Large
Language Model [53.65682783591723]
Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks.
However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information.
In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning.
arXiv Detail & Related papers (2023-12-12T04:05:15Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.