FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
- URL: http://arxiv.org/abs/2411.18552v1
- Date: Wed, 27 Nov 2024 17:51:44 GMT
- Title: FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
- Authors: Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martinez,
- Abstract summary: Inference at a scaled resolution leads to repetitive patterns and structural distortions.
We propose two simple modules that combine to solve these issues.
Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training.
- Score: 63.609399000712905
- License:
- Abstract: Diffusion models are proficient at generating high-quality images. They are however effective only when operating at the resolution used during training. Inference at a scaled resolution leads to repetitive patterns and structural distortions. Retraining at higher resolutions quickly becomes prohibitive. Thus, methods enabling pre-existing diffusion models to operate at flexible test-time resolutions are highly desirable. Previous works suffer from frequent artifacts and often introduce large latency overheads. We propose two simple modules that combine to solve these issues. We introduce a Frequency Modulation (FM) module that leverages the Fourier domain to improve the global structure consistency, and an Attention Modulation (AM) module which improves the consistency of local texture patterns, a problem largely ignored in prior works. Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training. Extensive qualitative results highlight the effectiveness of our method in addressing structural and local artifacts, while quantitative results show state-of-the-art performance. Also, our method avoids redundant inference tricks for improved consistency such as patch-based or progressive generation, leading to negligible latency overheads.
Related papers
- A Wavelet Diffusion GAN for Image Super-Resolution [7.986370916847687]
Diffusion models have emerged as a superior alternative to generative adversarial networks (GANs) for high-fidelity image generation.
However, their real-time feasibility is hindered by slow training and inference speeds.
This study proposes a wavelet-based conditional Diffusion GAN scheme for Single-Image Super-Resolution.
arXiv Detail & Related papers (2024-10-23T15:34:06Z) - DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance [11.44012694656102]
Large-scale generative models, such as text-to-image diffusion models, have garnered widespread attention across diverse domains.
Existing large-scale diffusion models are confined to generating images of up to 1K resolution.
We propose a novel progressive approach that fully utilizes generated low-resolution images to guide the generation of higher-resolution images.
arXiv Detail & Related papers (2024-06-26T16:10:31Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion.
This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement.
We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - A Unified Conditional Framework for Diffusion-based Image Restoration [39.418415473235235]
We present a unified conditional framework based on diffusion models for image restoration.
We leverage a lightweight UNet to predict initial guidance and the diffusion model to learn the residual of the guidance.
To handle high-resolution images, we propose a simple yet effective inter-step patch-splitting strategy.
arXiv Detail & Related papers (2023-05-31T17:22:24Z) - Implicit Diffusion Models for Continuous Super-Resolution [65.45848137914592]
This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution.
IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework.
The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output.
arXiv Detail & Related papers (2023-03-29T07:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.