FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
- URL: http://arxiv.org/abs/2410.18410v2
- Date: Thu, 27 Feb 2025 06:43:40 GMT
- Title: FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
- Authors: Zhengqiang Zhang, Ruihuang Li, Lei Zhang,
- Abstract summary: FreCaS decomposes the sampling process into cascaded stages with gradually increased resolutions.<n>FreCaS significantly outperforms state-of-the-art methods in image quality and generation speed.
- Score: 13.275724439963188
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While image generation with diffusion models has achieved a great success, generating images of higher resolution than the training size remains a challenging task due to the high computational cost. Current methods typically perform the entire sampling process at full resolution and process all frequency components simultaneously, contradicting with the inherent coarse-to-fine nature of latent diffusion models and wasting computations on processing premature high-frequency details at early diffusion stages. To address this issue, we introduce an efficient $\textbf{Fre}$quency-aware $\textbf{Ca}$scaded $\textbf{S}$ampling framework, $\textbf{FreCaS}$ in short, for higher-resolution image generation. FreCaS decomposes the sampling process into cascaded stages with gradually increased resolutions, progressively expanding frequency bands and refining the corresponding details. We propose an innovative frequency-aware classifier-free guidance (FA-CFG) strategy to assign different guidance strengths for different frequency components, directing the diffusion model to add new details in the expanded frequency domain of each stage. Additionally, we fuse the cross-attention maps of previous and current stages to avoid synthesizing unfaithful layouts. Experiments demonstrate that FreCaS significantly outperforms state-of-the-art methods in image quality and generation speed. In particular, FreCaS is about 2.86$\times$ and 6.07$\times$ faster than ScaleCrafter and DemoFusion in generating a 2048$\times$2048 image using a pre-trained SDXL model and achieves an FID$_b$ improvement of 11.6 and 3.7, respectively. FreCaS can be easily extended to more complex models such as SD3. The source code of FreCaS can be found at https://github.com/xtudbxk/FreCaS.
Related papers
- MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize [27.749096921628457]
We propose a multiscale diffusion framework that generates hierarchical visual representations, which are subsequently integrated to form the final output.
Our method achieves an FID of 2.2 and an IS of 255.4 on the ImageNet 256x256 benchmark, reducing computational costs by 50% compared to baseline methods.
arXiv Detail & Related papers (2025-01-23T03:18:23Z) - Diffusion Models Need Visual Priors for Image Generation [86.92260591389818]
Diffusion on Diffusion (DoD) is an innovative multi-stage generation framework that first extracts visual priors from previously generated samples, then provides rich guidance for the diffusion model.
We evaluate DoD on the popular ImageNet-$256 times 256$ dataset, reducing 7$times$ training cost compared to SiT and DiT.
Our largest model DoD-XL achieves an FID-50K score of 1.83 with only 1 million training steps, which surpasses other state-of-the-art methods without bells and whistles during inference.
arXiv Detail & Related papers (2024-10-11T05:03:56Z) - Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule [50.260693393896716]
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images.
Recent techniques have been employed to automatically search for faster generation processes.
We introduce Flexiffusion, a novel training-free NAS paradigm designed to accelerate diffusion models.
arXiv Detail & Related papers (2024-09-26T06:28:05Z) - Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution [7.29314801047906]
We propose a novel Frequency Domain-guided multiscale Diffusion model (FDDiff)
FDDiff decomposes the high-frequency information complementing process into finer-grained steps.
We show that FDDiff outperforms prior generative methods with higher-fidelity super-resolution results.
arXiv Detail & Related papers (2024-05-16T11:58:52Z) - Cache Me if You Can: Accelerating Diffusion Models through Block Caching [67.54820800003375]
A large image-to-image network has to be applied many times to iteratively refine an image from random noise.
We investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small.
We propose a technique to automatically determine caching schedules based on each block's changes over timesteps.
arXiv Detail & Related papers (2023-12-06T00:51:38Z) - ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models [59.90959789767886]
We show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions.
By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$times$64 and LSUN Cat 256$times$256 datasets.
arXiv Detail & Related papers (2023-11-23T16:49:06Z) - Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of
Experts And Frequency-augmented Decoder Approach [17.693287544860638]
latent-based diffusion for image super-resolution improved by pre-trained text-image models.
latent-based methods utilize a feature encoder to transform the image and then implement the SR image generation in a compact latent space.
We propose a frequency compensation module that enhances the frequency components from latent space to pixel space.
arXiv Detail & Related papers (2023-10-18T14:39:25Z) - Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size
HD Images [56.17404812357676]
Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters composition problems when generating images of varying sizes.
We propose a two-stage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed images of any size.
We show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2x compared to the traditional tiled algorithm.
arXiv Detail & Related papers (2023-08-31T09:27:56Z) - SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two
Seconds [88.06788636008051]
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers.
These models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run.
We present a generic approach that unlocks running text-to-image diffusion models on mobile devices in less than $2$ seconds.
arXiv Detail & Related papers (2023-06-01T17:59:25Z) - On the Importance of Noise Scheduling for Diffusion Models [8.360383061862844]
We study the effect of noise scheduling strategies for denoising diffusion generative models.
This simple recipe yields state-of-the-art pixel-based diffusion models for high-resolution images on ImageNet.
arXiv Detail & Related papers (2023-01-26T07:37:22Z) - Wavelet Diffusion Models are fast and scalable Image Generators [3.222802562733787]
Diffusion models are a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances.
Recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts.
This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme.
We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality.
arXiv Detail & Related papers (2022-11-29T12:25:25Z) - Accelerating Score-based Generative Models for High-Resolution Image
Synthesis [42.076244561541706]
Score-based generative models (SGMs) have recently emerged as a promising class of generative models.
In this work, we consider the acceleration of high-resolution generation with SGMs.
We introduce a novel Target Distribution Sampling Aware (TDAS) method by leveraging the structural priors in space and frequency domains.
arXiv Detail & Related papers (2022-06-08T17:41:14Z) - Unsupervised Single Image Super-resolution Under Complex Noise [60.566471567837574]
This paper proposes a model-based unsupervised SISR method to deal with the general SISR task with unknown degradations.
The proposed method can evidently surpass the current state of the art (SotA) method (about 1dB PSNR) not only with a slighter model (0.34M vs. 2.40M) but also faster speed.
arXiv Detail & Related papers (2021-07-02T11:55:40Z) - Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge.
We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z) - Cascaded Diffusion Models for High Fidelity Image Generation [53.57766722279425]
We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge.
A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution.
We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation.
arXiv Detail & Related papers (2021-05-30T17:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.