DiffDoctor: Diagnosing Image Diffusion Models Before Treating
- URL: http://arxiv.org/abs/2501.12382v2
- Date: Tue, 11 Mar 2025 12:44:34 GMT
- Title: DiffDoctor: Diagnosing Image Diffusion Models Before Treating
- Authors: Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen, Hengshuang Zhao,
- Abstract summary: We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.<n>We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.<n>The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback.
- Score: 57.82359018425674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In spite of recent progress, image diffusion models still produce artifacts. A common solution is to leverage the feedback provided by quality assessment systems or human annotators to optimize the model, where images are generally rated in their entirety. In this work, we believe problem-solving starts with identification, yielding the request that the model should be aware of not just the presence of defects in an image, but their specific locations. Motivated by this, we propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts. Concretely, the first stage targets developing a robust artifact detector, for which we collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process, incorporating a carefully designed class-balance strategy. The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback. Extensive experiments on text-to-image diffusion models demonstrate the effectiveness of our artifact detector as well as the soundness of our diagnose-then-treat design.
Related papers
- Assessing the use of Diffusion models for motion artifact correction in brain MRI [0.6554326244334868]
We critically evaluate the use of diffusion models for correcting motion artifacts in 2D brain MRI scans.
We compare a diffusion model-based approach with state-of-the-art methods consisting of Unets trained in a supervised fashion on motion-affected images.
Our findings reveal mixed results: diffusion models can produce accurate predictions or generate harmful hallucinations in this context.
arXiv Detail & Related papers (2025-02-03T14:56:48Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.
In this paper, we investigate how detection performance varies across model backbones, types, and datasets.
We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG)
TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms.
We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z) - StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model [62.25424831998405]
StealthDiffusion is a framework that modifies AI-generated images into high-quality, imperceptible adversarial examples.
It is effective in both white-box and black-box settings, transforming AI-generated images into high-quality adversarial forgeries.
arXiv Detail & Related papers (2024-08-11T01:22:29Z) - Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model [0.10910416614141322]
Diffusion models are the state-of-the-art solution for generating in-silico images.<n>Appearance transfer diffusion models are designed for natural images.<n>In computational pathology, specifically in oncology, it is not straightforward to define which objects in an image should be classified as foreground and background.<n>We contribute to the applicability of appearance transfer models to diffusion-stained images by modifying the appearance transfer guidance to alternate between class-specific AdaIN feature statistics matchings.
arXiv Detail & Related papers (2024-07-16T12:36:26Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Introducing Shape Prior Module in Diffusion Model for Medical Image
Segmentation [7.7545714516743045]
We propose an end-to-end framework called VerseDiff-UNet, which leverages the denoising diffusion probabilistic model (DDPM)
Our approach integrates the diffusion model into a standard U-shaped architecture.
We evaluate our method on a single dataset of spine images acquired through X-ray imaging.
arXiv Detail & Related papers (2023-09-12T03:05:00Z) - Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation [53.04220377034574]
We propose incorporating an analytical image attenuation process into the forward diffusion process for high-quality (un)conditioned image generation.<n>Our method represents the forward image-to-noise mapping as simultaneous textitimage-to-zero mapping and textitzero-to-noise mapping.<n>We have conducted experiments on unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256, and image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image inpainting.
arXiv Detail & Related papers (2023-06-23T18:08:00Z) - DIRE for Diffusion-Generated Image Detection [128.95822613047298]
We propose a novel representation called DIffusion Reconstruction Error (DIRE)
DIRE measures the error between an input image and its reconstruction counterpart by a pre-trained diffusion model.
It provides a hint that DIRE can serve as a bridge to distinguish generated and real images.
arXiv Detail & Related papers (2023-03-16T13:15:03Z) - ADIR: Adaptive Diffusion for Image Reconstruction [46.838084286784195]
We propose a conditional sampling scheme that exploits the prior learned by diffusion models.
We then combine it with a novel approach for adapting pretrained diffusion denoising networks to their input.
We show that our proposed adaptive diffusion for image reconstruction' approach achieves a significant improvement in the super-resolution, deblurring, and text-based editing tasks.
arXiv Detail & Related papers (2022-12-06T18:39:58Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z) - DiffGAR: Model-Agnostic Restoration from Generative Artifacts Using
Image-to-Image Diffusion Models [46.46919194633776]
This work aims to develop a plugin post-processing module for diverse generative models.
Unlike traditional degradation patterns, generative artifacts are non-linear and the transformation function is highly complex.
arXiv Detail & Related papers (2022-10-16T16:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.