CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing
- URL: http://arxiv.org/abs/2512.09806v1
- Date: Wed, 10 Dec 2025 16:20:00 GMT
- Title: CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing
- Authors: Jianfei Li, Ines Rosellon-Inclan, Gitta Kutyniok, Jean-Luc Starck,
- Abstract summary: U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks.<n>However, these methods might generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios.<n>This paper introduces a novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models.
- Score: 17.573711532387176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks. However, challenges have emerged, as these methods might generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios. This paper introduces a novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models. Our method, termed the Conformal Hallucination Estimation Metric (CHEM), is applicable to any image reconstruction model, enabling efficient identification and quantification of hallucination artifacts. It offers two key advantages: it leverages wavelet and shearlet representations to efficiently extract hallucinations of image features and uses conformalized quantile regression to assess hallucination levels in a distribution-free manner. Furthermore, from an approximation theoretical perspective, we explore the reasons why U-shaped networks are prone to hallucinations. We test the proposed approach on the CANDELS astronomical image dataset with models such as U-Net, SwinUNet, and Learnlets, and provide new perspectives on hallucination from different aspects in deep learning-based image processing.
Related papers
- A novel hallucination classification framework [0.0]
This work introduces a novel methodology for the automatic detection of hallucinations generated during large language model (LLM) inference.<n>The proposed approach is based on a systematic taxonomy and controlled reproduction of diverse hallucination types through prompt engineering.
arXiv Detail & Related papers (2025-10-06T09:54:20Z) - MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models [73.20126092411776]
We conduct the first systematic study of hallucinations in multi-image MLLMs.<n>We propose MIHBench, a benchmark specifically tailored for evaluating object-related hallucinations across multiple images.<n>MIHBench comprises three core tasks: Multi-Image Object Existence Hallucination, Multi-Image Object Count Hallucination, and Object Identity Consistency Hallucination.
arXiv Detail & Related papers (2025-08-01T15:49:29Z) - Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations [82.42811602081692]
This paper introduces a subsequence association framework to systematically trace and understand hallucinations.<n>Key insight is hallucinations that arise when dominant hallucinatory associations outweigh faithful ones.<n>We propose a tracing algorithm that identifies causal subsequences by analyzing hallucination probabilities across randomized input contexts.
arXiv Detail & Related papers (2025-04-17T06:34:45Z) - Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy [53.07517728420411]
We introduce the first instruction database specifically focused on hallucinations in low-level vision tasks.<n>We propose the Self-Awareness Failure Elimination (SAFEQA) model to improve the perception and comprehension abilities of the model in low-level vision tasks.<n>We conduct comprehensive experiments on low-level vision tasks, with the results demonstrating that our proposed method significantly enhances self-awareness of the model in these tasks and reduces hallucinations.
arXiv Detail & Related papers (2025-03-26T16:05:01Z) - Mitigating Hallucinations in Large Vision-Language Models by Adaptively Constraining Information Flow [32.039946174953236]
Large vision-language models show tremendous potential in understanding visual information through human languages.<n>They are prone to suffer from object hallucination, i.e., the generated image descriptions contain objects that do not exist in the image.<n>We propose Variational Information Bottleneck (VIB) to alleviate overconfidence by introducing hallucination noise.
arXiv Detail & Related papers (2025-02-28T05:56:23Z) - Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models [24.241691571850403]
Large Vision-Language Models (LVLMs) integrate image encoders with Large Language Models (LLMs) to process multi-modal inputs and perform complex visual tasks.<n>They often generate hallucinations by describing non-existent objects or attributes, compromising their reliability.<n>This study analyzes hallucination patterns in image captioning, showing that not all tokens in the generation process are influenced by image input.
arXiv Detail & Related papers (2025-02-24T05:00:52Z) - Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models [65.4610281589017]
Large Vision-Language Models (LVLMs) are prone to generating hallucinatory text responses that do not align with the given visual input.<n>We introduce self-correcting Decoding with Generative Feedback (DeGF), a novel training-free algorithm that incorporates feedback from text-to-image generative models into the decoding process.
arXiv Detail & Related papers (2025-02-10T03:43:55Z) - Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks.<n>These models still suffer from hallucinations when required to implicitly recognize or infer diverse visual entities from images.<n>We propose a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks.
arXiv Detail & Related papers (2024-12-29T23:56:01Z) - VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding [38.23310445372371]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal task reasoning.<n>They often generate responses that appear plausible yet do not accurately reflect the visual content, a phenomenon known as hallucination.<n>Recent approaches have introduced training-free methods to mitigate hallucinations by adjusting the decoding strategy during the inference stage.<n>We propose a novel hallucination-mitigation method from the visual encoding perspective: textbfVisutextbfal textbfLayer Fustextbfion Contrastive textbfD
arXiv Detail & Related papers (2024-11-24T13:42:02Z) - From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models [15.401221354325672]
Hallucinations in large vision models (LVLMs) are a significant challenge, i.e., generating objects that are not presented in the visual input.<n>Recent studies often attribute hallucinations to a lack of understanding of visual input, yet ignore a more fundamental issue: the model's inability to extract or decouple visual features.<n>In this paper, we revisit the hallucinations in LVLMs from an architectural perspective, investigating whether the primary cause lies in the visual encoder (feature extraction) or the modal alignment module (feature decoupling)
arXiv Detail & Related papers (2024-10-09T11:46:32Z) - Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization [123.54980913741828]
Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data.<n>They invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images.<n>Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information.<n>However, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations.
arXiv Detail & Related papers (2024-05-24T08:46:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.