Mitigating Object Hallucination via Robust Local Perception Search
- URL: http://arxiv.org/abs/2506.06729v1
- Date: Sat, 07 Jun 2025 09:27:26 GMT
- Title: Mitigating Object Hallucination via Robust Local Perception Search
- Authors: Zixian Gao, Chao Yang, Zhanhui Zhou, Xing Xu, Chaochao Lu,
- Abstract summary: Local Perception Search (LPS) is a decoding method during inference that is both simple and training-free, yet effectively suppresses hallucinations.<n>We show that LPS significantly reduces the incidence of hallucinations compared to the baseline, showing exceptional performance, particularly in noisy settings.
- Score: 11.570368427723961
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have enabled them to effectively integrate vision and language, addressing a variety of downstream tasks. However, despite their significant success, these models still exhibit hallucination phenomena, where the outputs appear plausible but do not align with the content of the images. To mitigate this issue, we introduce Local Perception Search (LPS), a decoding method during inference that is both simple and training-free, yet effectively suppresses hallucinations. This method leverages local visual prior information as a value function to correct the decoding process. Additionally, we observe that the impact of the local visual prior on model performance is more pronounced in scenarios with high levels of image noise. Notably, LPS is a plug-and-play approach that is compatible with various models. Extensive experiments on widely used hallucination benchmarks and noisy data demonstrate that LPS significantly reduces the incidence of hallucinations compared to the baseline, showing exceptional performance, particularly in noisy settings.
Related papers
- SAVER: Mitigating Hallucinations in Large Vision-Language Models via Style-Aware Visual Early Revision [59.61988843996952]
Style-Aware Visual Early Revision SAVER is a novel mechanism that dynamically adjusts LVLMs' final outputs based on the token-level visual attention patterns.<n>We show that SAVER achieves state-of-the-art performance in hallucination mitigation across various models, datasets, and tasks.
arXiv Detail & Related papers (2025-08-05T07:41:25Z) - Stop learning it all to mitigate visual hallucination, Focus on the hallucination target [0.10571493942475592]
Multimodal Large Language Models (MLLMs) frequently suffer from hallucination issues.<n> hallucinations undermine model reliability in practical applications.<n>Mymethod is a preference learning approach that mitigates hallucinations by focusing on targeted areas.
arXiv Detail & Related papers (2025-06-13T02:35:03Z) - Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations [44.83933994734478]
Large language models (MLLMs) have demonstrated remarkable performance in visual tasks.<n>However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations.<n>We identify that a key cause of these hallucinations is the model's over-susceptibility to specific image frequency features in detecting objects.
arXiv Detail & Related papers (2025-03-19T04:39:45Z) - PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training [56.172959986096316]
This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs)<n>HalFscore is a novel metric built upon the language graph and is designed to evaluate both the accuracy and completeness of dense captions at a granular level.<n>PerturboLLaVA significantly improves the fidelity of generated captions, outperforming existing approaches in handling multimodal hallucinations.
arXiv Detail & Related papers (2025-03-09T07:07:03Z) - Mitigating Hallucination for Large Vision Language Model by Inter-Modality Correlation Calibration Decoding [66.06337890279839]
Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding for downstream multi-modal tasks.<n>LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content.<n>We propose an Inter-Modality Correlation Decoding (IMCCD) method to mitigate hallucinations in LVLMs in a training-free manner.
arXiv Detail & Related papers (2025-01-03T17:56:28Z) - Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning [151.4060202671114]
multimodal large language models (MLLMs) have shown unprecedented capabilities in advancing vision-language tasks.<n>This paper introduces a novel bottom-up reasoning framework to address hallucinations in MLLMs.<n>Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge.
arXiv Detail & Related papers (2024-12-15T09:10:46Z) - VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding [38.23310445372371]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal task reasoning.<n>They often generate responses that appear plausible yet do not accurately reflect the visual content, a phenomenon known as hallucination.<n>Recent approaches have introduced training-free methods to mitigate hallucinations by adjusting the decoding strategy during the inference stage.<n>We propose a novel hallucination-mitigation method from the visual encoding perspective: textbfVisutextbfal textbfLayer Fustextbfion Contrastive textbfD
arXiv Detail & Related papers (2024-11-24T13:42:02Z) - MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation [50.73561815838431]
Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena.<n>We present an empirical analysis and find that, although MLLMs incorrectly generate the objects in the final output, they are actually able to recognize visual objects in the preceding layers.<n>Motivated by this, we propose a novel dynamic correction decoding method for MLLMs DeCo, which adaptively selects the appropriate preceding layers and proportionally integrates knowledge into the final layer to adjust the output logits.
arXiv Detail & Related papers (2024-10-15T16:57:44Z) - Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization [123.54980913741828]
Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data.<n>They invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images.<n>Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information.<n>However, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations.
arXiv Detail & Related papers (2024-05-24T08:46:31Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.<n>We generate a small-size hallucination annotation dataset by proprietary models.<n>Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding [25.489832294197797]
This paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference.
Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules.
arXiv Detail & Related papers (2024-03-27T16:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.