Effective Diffusion Transformer Architecture for Image Super-Resolution
- URL: http://arxiv.org/abs/2409.19589v1
- Date: Sun, 29 Sep 2024 07:14:16 GMT
- Title: Effective Diffusion Transformer Architecture for Image Super-Resolution
- Authors: Kun Cheng, Lei Yu, Zhijun Tu, Xiao He, Liyu Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu,
- Abstract summary: We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
- Score: 63.254644431016345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances indicate that diffusion models hold great promise in image super-resolution. While the latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effective diffusion transformer for image super-resolution (DiT-SR) that achieves the visual quality of prior-based methods, but through a training-from-scratch manner. In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks across different stages. The former facilitates multi-scale hierarchical feature extraction, while the latter reallocates the computational resources to critical layers to further enhance performance. Moreover, we thoroughly analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module, enhancing the model's capacity to process distinct frequency information at different time steps. Extensive experiments demonstrate that DiT-SR outperforms the existing training-from-scratch diffusion-based SR methods significantly, and even beats some of the prior-based methods on pretrained Stable Diffusion, proving the superiority of diffusion transformer in image super-resolution.
Related papers
- A Wavelet Diffusion GAN for Image Super-Resolution [7.986370916847687]
Diffusion models have emerged as a superior alternative to generative adversarial networks (GANs) for high-fidelity image generation.
However, their real-time feasibility is hindered by slow training and inference speeds.
This study proposes a wavelet-based conditional Diffusion GAN scheme for Single-Image Super-Resolution.
arXiv Detail & Related papers (2024-10-23T15:34:06Z) - Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs [30.973473583364832]
DoSSR is a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models.
At the core of our approach is a domain shift equation that integrates seamlessly with existing diffusion models.
Our proposed method achieves state-of-the-art performance on synthetic and real-world datasets, while notably requiring only 5 sampling steps.
arXiv Detail & Related papers (2024-09-26T12:16:11Z) - Sequential Posterior Sampling with Diffusion Models [15.028061496012924]
We propose a novel approach that models the transition dynamics to improve the efficiency of sequential diffusion posterior sampling in conditional image synthesis.
We demonstrate the effectiveness of our approach on a real-world dataset of high frame rate cardiac ultrasound images.
Our method opens up new possibilities for real-time applications of diffusion models in imaging and other domains requiring real-time inference.
arXiv Detail & Related papers (2024-09-09T07:55:59Z) - One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts.
Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation.
We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - Stage-by-stage Wavelet Optimization Refinement Diffusion Model for
Sparse-View CT Reconstruction [14.037398189132468]
We present an innovative approach named the Stage-by-stage Wavelet Optimization Refinement Diffusion (SWORD) model for sparse-view CT reconstruction.
Specifically, we establish a unified mathematical model integrating low-frequency and high-frequency generative models, achieving the solution with optimization procedure.
Our method rooted in established optimization theory, comprising three distinct stages, including low-frequency generation, high-frequency refinement and domain transform.
arXiv Detail & Related papers (2023-08-30T10:48:53Z) - ResShift: Efficient Diffusion Model for Image Super-resolution by
Residual Shifting [70.83632337581034]
Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed.
We propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps.
Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual.
arXiv Detail & Related papers (2023-07-23T15:10:02Z) - ACDMSR: Accelerated Conditional Diffusion Models for Single Image
Super-Resolution [84.73658185158222]
We propose a diffusion model-based super-resolution method called ACDMSR.
Our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process.
Our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
arXiv Detail & Related papers (2023-07-03T06:49:04Z) - Diffusion Glancing Transformer for Parallel Sequence to Sequence
Learning [52.72369034247396]
We propose the diffusion glancing transformer, which employs a modality diffusion process and residual glancing sampling.
DIFFGLAT achieves better generation accuracy while maintaining fast decoding speed compared with both autoregressive and non-autoregressive models.
arXiv Detail & Related papers (2022-12-20T13:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.