Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution
- URL: http://arxiv.org/abs/2507.14367v1
- Date: Fri, 18 Jul 2025 21:13:50 GMT
- Title: Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution
- Authors: Weiming Ren, Raghav Goyal, Zhiming Hu, Tristan Ty Aumentado-Armstrong, Iqbal Mohomed, Alex Levinshtein,
- Abstract summary: We focus on measuring, analyzing, and mitigating "hallucinations"<n>We take advantage of a multimodal large language model (MLLM) by constructing a prompt that assesses hallucinatory visual elements and generates a "Hallucination Score" (HS)<n>Our HS is closely aligned with human evaluations, and also provides complementary insights to prior image metrics used for super-resolution (SR) models.
- Score: 4.620784952867118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative super-resolution (GSR) currently sets the state-of-the-art in terms of perceptual image quality, overcoming the "regression-to-the-mean" blur of prior non-generative models. However, from a human perspective, such models do not fully conform to the optimal balance between quality and fidelity. Instead, a different class of artifacts, in which generated details fail to perceptually match the low resolution image (LRI) or ground-truth image (GTI), is a critical but under studied issue in GSR, limiting its practical deployments. In this work, we focus on measuring, analyzing, and mitigating these artifacts (i.e., "hallucinations"). We observe that hallucinations are not well-characterized with existing image metrics or quality models, as they are orthogonal to both exact fidelity and no-reference quality. Instead, we take advantage of a multimodal large language model (MLLM) by constructing a prompt that assesses hallucinatory visual elements and generates a "Hallucination Score" (HS). We find that our HS is closely aligned with human evaluations, and also provides complementary insights to prior image metrics used for super-resolution (SR) models. In addition, we find certain deep feature distances have strong correlations with HS. We therefore propose to align the GSR models by using such features as differentiable reward functions to mitigate hallucinations.
Related papers
- SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction [52.34513874272676]
We argue that existing methods rely too heavily on entangled visual embeddings over explicit semantic identity.<n>We parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding.<n>We propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model.
arXiv Detail & Related papers (2026-01-25T14:31:23Z) - CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing [17.573711532387176]
U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks.<n>However, these methods might generate unrealistic artifacts or hallucinations, which can interfere with analysis in safety-critical scenarios.<n>This paper introduces a novel approach for quantifying and comprehending hallucination artifacts to ensure trustworthy computer vision models.
arXiv Detail & Related papers (2025-12-10T16:20:00Z) - HalluGen: Synthesizing Realistic and Controllable Hallucinations for Evaluating Image Restoration [8.702496582146042]
HalluGen is a diffusion-based framework that synthesizes realistic hallucinations with controllable type, location, and severity.<n>We construct the first large-scale hallucination dataset comprising 4,350 annotated images.<n>HalluGen and its open dataset establish the first scalable foundation for evaluating hallucinations in safety-critical image restoration.
arXiv Detail & Related papers (2025-12-03T01:20:00Z) - GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs [61.829473661517675]
We introduce GHOST, a method designed to stress-test MLLMs by actively generating images that induce hallucination.<n>GHOST is fully automatic and requires no human supervision or prior knowledge.<n>We evaluate our method across a range of models, including reasoning models like GLM-4.1V-Thinking, and achieve a hallucination success rate exceeding 28%, compared to around 1% in prior data-driven discovery methods.
arXiv Detail & Related papers (2025-09-29T17:59:23Z) - Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling [67.14942827452161]
Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations.<n>In this work, we introduce REVERSE, a unified framework that integrates hallucination-aware training with on-the-fly self-verification.
arXiv Detail & Related papers (2025-04-17T17:59:22Z) - Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy [53.07517728420411]
We introduce the first instruction database specifically focused on hallucinations in low-level vision tasks.<n>We propose the Self-Awareness Failure Elimination (SAFEQA) model to improve the perception and comprehension abilities of the model in low-level vision tasks.<n>We conduct comprehensive experiments on low-level vision tasks, with the results demonstrating that our proposed method significantly enhances self-awareness of the model in these tasks and reduces hallucinations.
arXiv Detail & Related papers (2025-03-26T16:05:01Z) - Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models [36.327525062842724]
hallucination is especially concerning in high-stakes domains such as healthcare, legal, and aviation.<n>We examine how factors such as distribution shifts, model size, and model architecture influence hallucination error rate (HER), a metric we introduce to quantify hallucinations.<n>Our findings highlight the importance of incorporating HER alongside traditional metrics like WER to better assess ASR model performance.
arXiv Detail & Related papers (2025-02-18T01:25:39Z) - Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks.<n>These models still suffer from hallucinations when required to implicitly recognize or infer diverse visual entities from images.<n>We propose a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks.
arXiv Detail & Related papers (2024-12-29T23:56:01Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.<n>We generate a small-size hallucination annotation dataset by proprietary models.<n>Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning [13.805780090705252]
We propose a targeted instruction data generation framework named DFTG that tailored to the hallucination specificity of different models.<n>The experimental results on hallucination benchmarks demonstrate that the targeted instruction data generated by our method are more effective in mitigating hallucinations compared to previous datasets.
arXiv Detail & Related papers (2024-04-16T07:14:32Z) - IBD: Alleviating Hallucinations in Large Vision-Language Models via
Image-Biased Decoding [37.16880672402059]
Over-reliance on linguistic priors has been identified as a key factor leading to hallucinations.
We propose to alleviate this problem by introducing a novel image-biased decoding technique.
Our method derives the next-token probability distribution by contrasting predictions from a conventional LVLM with those of an image-biased LVLM.
arXiv Detail & Related papers (2024-02-28T16:57:22Z) - Aligning Modalities in Vision Large Language Models via Preference
Fine-tuning [67.62925151837675]
In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning.
Specifically, we propose POVID to generate feedback data with AI models.
We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data.
In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches.
arXiv Detail & Related papers (2024-02-18T00:56:16Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.