Related papers: Mitigating Hallucinations in Diffusion Models through Adaptive Attention Modulation

Mitigating Hallucinations in Diffusion Models through Adaptive Attention Modulation

URL: http://arxiv.org/abs/2502.16872v1
Date: Mon, 24 Feb 2025 06:19:54 GMT
Title: Mitigating Hallucinations in Diffusion Models through Adaptive Attention Modulation
Authors: Trevine Oorloff, Yaser Yacoob, Abhinav Shrivastava,
Abstract summary: We propose Adaptive Attention Modulation (AAM), a novel approach to mitigate hallucinations by analyzing and modulating the self-attention mechanism in diffusion models.<n>AAM effectively reduces hallucinatory artifacts, enhancing both the fidelity and reliability of generated images.
Score: 36.2882418279168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models, while increasingly adept at generating realistic images, are notably hindered by hallucinations -- unrealistic or incorrect features inconsistent with the trained data distribution. In this work, we propose Adaptive Attention Modulation (AAM), a novel approach to mitigate hallucinations by analyzing and modulating the self-attention mechanism in diffusion models. We hypothesize that self-attention during early denoising steps may inadvertently amplify or suppress features, contributing to hallucinations. To counter this, AAM introduces a temperature scaling mechanism within the softmax operation of the self-attention layers, dynamically modulating the attention distribution during inference. Additionally, AAM employs a masked perturbation technique to disrupt early-stage noise that may otherwise propagate into later stages as hallucinations. Extensive experiments demonstrate that AAM effectively reduces hallucinatory artifacts, enhancing both the fidelity and reliability of generated images. For instance, the proposed approach improves the FID score by 20.8% and reduces the percentage of hallucinated images by 12.9% (in absolute terms) on the Hands dataset.

Related papers

Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization [78.94590726578014]
multimodal reasoning models (MLRMs) remain prone to hallucinations, and effective solutions are still underexplored.<n>We propose C3PO, a training-based mitigation framework comprising textbfCompression and textbfPreference textbfOptimization.
arXiv Detail & Related papers (2026-02-03T11:00:55Z)
Noise as a Probe: Membership Inference Attacks on Diffusion Models Leveraging Initial Noise [51.179816451161635]
Diffusion models have achieved remarkable progress in image generation, but their increasing deployment raises serious concerns about privacy.<n>In this work, we utilize a critical yet overlooked vulnerability: the widely used noise schedules fail to fully eliminate semantic information in the images.<n>We propose a simple yet effective membership inference attack, which injects semantic information into the initial noise and infers membership by analyzing the model's generation result.
arXiv Detail & Related papers (2026-01-29T12:29:01Z)
Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation [52.52962914918779]
hallucination issues significantly limit their credibility and application potential.<n>Existing mitigation methods rely on external tools or the comparison of multi-round inference.<n>We propose textbfSElf-textbfEvolving textbfDistillation (textbfSEED), which identifies hallucinations within the inner knowledge of LVLMs, isolates and purges them, and then distills the purified knowledge back into the model.
arXiv Detail & Related papers (2025-07-07T05:56:19Z)
Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing [92.61216319417208]
We propose a novel frequency domain-based diffusion model, named ours, for fully exploiting the beneficial knowledge in unpaired clear data.<n>Inspired by the strong generative ability shown by Diffusion Models (DMs), we tackle the dehazing task from the perspective of frequency domain reconstruction.
arXiv Detail & Related papers (2025-07-02T01:22:46Z)
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling [67.14942827452161]
Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations. In this work, we introduce REVERSE, a unified framework that integrates hallucination-aware training with on-the-fly self-verification.
arXiv Detail & Related papers (2025-04-17T17:59:22Z)
A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising [43.44633086975204]
We propose an intuitive method for leveraging pretrained diffusion models. We then introduce our proposed Linear Combination Diffusion Denoiser. LCDD achieves state-of-the-art performance and offers controlled, well-behaved trade-offs.
arXiv Detail & Related papers (2025-03-18T19:02:19Z)
Tackling Hallucination from Conditional Models for Medical Image Reconstruction with DynamicDPS [3.572461722393917]
Hallucinations are spurious structures not present in the ground truth. We propose DynamicDPS, a diffusion-based framework that integrates conditional and unconditional diffusion models. Our method effectively reduces hallucinations from any conditional model output.
arXiv Detail & Related papers (2025-03-03T00:33:04Z)
Enhancing Hallucination Detection through Noise Injection [9.582929634879932]
Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations.<n>We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense.<n>We propose a very simple and efficient approach that perturbs an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling.
arXiv Detail & Related papers (2025-02-06T06:02:20Z)
Assessing the use of Diffusion models for motion artifact correction in brain MRI [0.6554326244334868]
We critically evaluate the use of diffusion models for correcting motion artifacts in 2D brain MRI scans.<n>We compare a diffusion model-based approach with state-of-the-art methods consisting of Unets trained in a supervised fashion on motion-affected images.<n>Our findings reveal mixed results: diffusion models can produce accurate predictions or generate harmful hallucinations in this context.
arXiv Detail & Related papers (2025-02-03T14:56:48Z)
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding [5.424048651554831]
Internal Fact-based Contrastive Decoding (IFCD) is designed to mitigate and suppress hallucinations during the inference process of Large Visual Language Models (LVLMs)<n>IFCD calibrates the LVLMs' output and effectively removes the hallucinatory logits from the final predictions.<n> Experimental results validate that IFCD significantly alleviates both object-level and attribute-level hallucinations while achieving an average 9% accuracy improvement on POPE and 8% accuracy improvement on MME object hallucinations subset compared with direct decoding, respectively.
arXiv Detail & Related papers (2025-02-03T05:08:35Z)
DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.<n>We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.<n>The learned artifact detector is then involved in the second stage to tune the diffusion model through assigning a per-pixel confidence map for each image.
arXiv Detail & Related papers (2025-01-21T18:56:41Z)
Data-augmented phrase-level alignment for mitigating object hallucination [52.43197107069751]
Multimodal Large Language Models (MLLMs) often generate factually inaccurate information, referred to as hallucination. We introduce Data-augmented Phrase-level Alignment (DPA), a novel loss which can be applied to instruction-tuned off-the-shelf MLLMs to mitigate hallucinations.
arXiv Detail & Related papers (2024-05-28T23:36:00Z)
Stimulating Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling [56.506240377714754]
We present a novel strategy called the Diffusion Model for Image Denoising (DMID) Our strategy includes an adaptive embedding method that embeds the noisy image into a pre-trained unconditional diffusion model. Our DMID strategy achieves state-of-the-art performance on both distortion-based and perception-based metrics.
arXiv Detail & Related papers (2023-07-08T14:59:41Z)
Mask, Stitch, and Re-Sample: Enhancing Robustness and Generalizability in Anomaly Detection through Automatic Diffusion Models [8.540959938042352]
We propose AutoDDPM, a novel approach that enhances the robustness of diffusion models. Through joint noised distribution re-sampling, AutoDDPM achieves the harmonization and in-painting effects. It also contributes valuable insights and analysis on the limitations of current diffusion models.
arXiv Detail & Related papers (2023-05-31T08:21:17Z)
DDS2M: Self-Supervised Denoising Diffusion Spatio-Spectral Model for Hyperspectral Image Restoration [103.79030498369319]
Self-supervised diffusion model for hyperspectral image restoration is proposed. textttDDS2M enjoys stronger ability to generalization compared to existing diffusion-based methods. Experiments on HSI denoising, noisy HSI completion and super-resolution on a variety of HSIs demonstrate textttDDS2M's superiority over the existing task-specific state-of-the-arts.
arXiv Detail & Related papers (2023-03-12T14:57:04Z)
The role of noise in denoising models for anomaly detection in medical images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images. Unsupervised anomaly detection approaches have been proposed using only normal data for training. We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.