AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation
- URL: http://arxiv.org/abs/2412.02099v1
- Date: Tue, 03 Dec 2024 02:44:35 GMT
- Title: AccDiffusion v2: Towards More Accurate Higher-Resolution Diffusion Extrapolation
- Authors: Zhihang Lin, Mingbao Lin, Wengyi Zhan, Rongrong Ji,
- Abstract summary: Diffusion models suffer severe object repetition and local distortion when the inference resolution differs from its pre-trained resolution.
We propose AccDiffusion v2, an accurate method for patch-wise higher-resolution diffusion extrapolation without training.
- Score: 59.91176945361035
- License:
- Abstract: Diffusion models suffer severe object repetition and local distortion when the inference resolution differs from its pre-trained resolution. We propose AccDiffusion v2, an accurate method for patch-wise higher-resolution diffusion extrapolation without training. Our in-depth analysis in this paper shows that using an identical text prompt for different patches leads to repetitive generation, while the absence of a prompt undermines image details. In response, our AccDiffusion v2 novelly decouples the vanilla image-content-aware prompt into a set of patch-content-aware prompts, each of which serves as a more precise description of a patch. Further analysis reveals that local distortion arises from inaccurate descriptions in prompts about the local structure of higher-resolution images. To address this issue, AccDiffusion v2, for the first time, introduces an auxiliary local structural information through ControlNet during higher-resolution diffusion extrapolation aiming to mitigate the local distortions. Finally, our analysis indicates that global semantic information is conducive to suppressing both repetitive generation and local distortion. Hence, our AccDiffusion v2 further proposes dilated sampling with window interaction for better global semantic information during higher-resolution diffusion extrapolation. We conduct extensive experiments, including both quantitative and qualitative comparisons, to demonstrate the efficacy of our AccDiffusion v2. The quantitative comparison shows that AccDiffusion v2 achieves state-of-the-art performance in image generation extrapolation without training. The qualitative comparison intuitively illustrates that AccDiffusion v2 effectively suppresses the issues of repetitive generation and local distortion in image generation extrapolation. Our code is available at \url{https://github.com/lzhxmu/AccDiffusion_v2}.
Related papers
- AccDiffusion: An Accurate Method for Higher-Resolution Image Generation [63.53163540340026]
We propose AccDiffusion, an accurate method for patch-wise higher-resolution image generation without training.
An in-depth analysis in this paper reveals an identical text prompt for different patches causes repeated object generation.
Our AccDiffusion, for the first time, proposes to decouple the vanilla image-content-aware prompt into a set of patch-content-aware prompts.
arXiv Detail & Related papers (2024-07-15T14:06:29Z) - Diffusion-based Image Generation for In-distribution Data Augmentation in Surface Defect Detection [8.93281936150572]
We show that diffusion models can be used in industrial scenarios to improve the data augmentation procedure.
We propose a novel approach for data augmentation that mixes out-of-distribution with in-distribution samples.
arXiv Detail & Related papers (2024-06-01T17:09:18Z) - Diffusion Bridge AutoEncoders for Unsupervised Representation Learning [10.74555302283403]
We introduce Diffusion Bridge AuteEncoders (DBAE), which enable z-dependent endpoint xT inference through a feed-forward architecture.
We propose an objective function for DBAE to enable both reconstruction and generative modeling, with their theoretical justification.
arXiv Detail & Related papers (2024-05-27T12:28:17Z) - Text Diffusion with Reinforced Conditioning [92.17397504834825]
This paper thoroughly analyzes text diffusion models and uncovers two significant limitations: degradation of self-conditioning during training and misalignment between training and sampling.
Motivated by our findings, we propose a novel Text Diffusion model called TREC, which mitigates the degradation with Reinforced Conditioning and the misalignment by Time-Aware Variance Scaling.
arXiv Detail & Related papers (2024-02-19T09:24:02Z) - Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models [82.8261101680427]
Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image.
This property proves beneficial in downstream tasks, including image inversion, inversion, and editing.
We propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth.
arXiv Detail & Related papers (2023-12-07T16:26:23Z) - Are Diffusion Models Vision-And-Language Reasoners? [30.579483430697803]
We transform diffusion-based models for any image-text matching (ITM) task using a novel method called DiffusionITM.
We introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis.
We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like CLEVR and Winoground.
arXiv Detail & Related papers (2023-05-25T18:02:22Z) - DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery [20.787180028571694]
DiffusionSeg is a synthesis-exploitation framework containing two-stage strategies.
We synthesize abundant images, and propose a novel training-free AttentionCut to obtain masks in the first stage.
In the second exploitation stage, to bridge the structural gap, we use the inversion technique, to map the given image back to diffusion features.
arXiv Detail & Related papers (2023-03-17T07:47:55Z) - DR2: Diffusion-based Robust Degradation Remover for Blind Face
Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training.
It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z) - Diffusion-GAN: Training GANs with Diffusion [135.24433011977874]
Generative adversarial networks (GANs) are challenging to train stably.
We propose Diffusion-GAN, a novel GAN framework that leverages a forward diffusion chain to generate instance noise.
We show that Diffusion-GAN can produce more realistic images with higher stability and data efficiency than state-of-the-art GANs.
arXiv Detail & Related papers (2022-06-05T20:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.