Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
- URL: http://arxiv.org/abs/2511.01175v2
- Date: Tue, 04 Nov 2025 05:16:07 GMT
- Title: Diffusion Transformer meets Multi-level Wavelet Spectrum for Single Image Super-Resolution
- Authors: Peng Du, Hui Li, Han Xu, Paul Barom Jeon, Dongwook Lee, Daehyun Ji, Ran Yang, Feng Zhu,
- Abstract summary: We propose a Diffusion Transformer model based on image Wavelet spectra for SR (DTWSR)<n>DTWSR incorporates the superiority of diffusion models and transformers to capture the interrelations among multiscale frequency sub-bands.<n>A dual-decoder is designed elaborately to handle the distinct variances in low-frequency and high-frequency sub-bands, without omitting their alignment in image generation.
- Score: 15.056888813012451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete Wavelet Transform (DWT) has been widely explored to enhance the performance of image superresolution (SR). Despite some DWT-based methods improving SR by capturing fine-grained frequency signals, most existing approaches neglect the interrelations among multiscale frequency sub-bands, resulting in inconsistencies and unnatural artifacts in the reconstructed images. To address this challenge, we propose a Diffusion Transformer model based on image Wavelet spectra for SR (DTWSR). DTWSR incorporates the superiority of diffusion models and transformers to capture the interrelations among multiscale frequency sub-bands, leading to a more consistence and realistic SR image. Specifically, we use a Multi-level Discrete Wavelet Transform to decompose images into wavelet spectra. A pyramid tokenization method is proposed which embeds the spectra into a sequence of tokens for transformer model, facilitating to capture features from both spatial and frequency domain. A dual-decoder is designed elaborately to handle the distinct variances in low-frequency and high-frequency sub-bands, without omitting their alignment in image generation. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method, with high performance on both perception quality and fidelity.
Related papers
- HDW-SR: High-Frequency Guided Diffusion Model based on Wavelet Decomposition for Image Super-Resolution [4.388490927225987]
We propose a High-Frequency Guided Diffusion Network based on Wavelet Decomposition (HDW-SR)<n>We perform diffusion only on the residual map, allowing the network to focus more effectively on high-frequency information restoration.<n> Experiments on both synthetic and real-world datasets demonstrate that HDW-SR achieves competitive super-resolution performance.
arXiv Detail & Related papers (2025-11-17T09:25:26Z) - Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z) - Dual-domain Modulation Network for Lightweight Image Super-Resolution [26.992373105057684]
Lightweight image super-resolution (SR) aims to reconstruct high-resolution images from low-resolution images under limited computational costs.<n>Existing frequency-based SR methods cannot balance the reconstruction of overall structures and high-frequency parts.<n>We show that introducing both wavelet and Fourier information allows our model to consider both high-frequency features and overall SR structure reconstruction while reducing costs.
arXiv Detail & Related papers (2025-03-13T04:59:46Z) - Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution [54.293362972473595]
Image super-resolution (SR) aims to reconstruct high-resolution (HR) images from their low-resolution (LR) counterparts.
Current approaches to address SR tasks are either dedicated to extracting RGB image features or assuming similar degradation patterns.
We propose a Contourlet refinement gate framework to restore infrared modal-specific features while preserving spectral distribution fidelity.
arXiv Detail & Related papers (2024-11-19T14:24:03Z) - Local Implicit Wavelet Transformer for Arbitrary-Scale Super-Resolution [15.610136214020947]
Implicit neural representations have recently demonstrated promising potential in arbitrary-scale Super-Resolution (SR) of images.
Most existing methods predict the pixel in the SR image based on the queried coordinate and ensemble nearby features.
We propose the Local Implicit Wavelet Transformer (LIWT) to enhance the restoration of high-frequency texture details.
arXiv Detail & Related papers (2024-11-10T12:21:14Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - FreqINR: Frequency Consistency for Implicit Neural Representation with Adaptive DCT Frequency Loss [5.349799154834945]
This paper introduces Frequency Consistency for Implicit Neural Representation (FreqINR), an innovative Arbitrary-scale Super-resolution method.
During training, we employ Adaptive Discrete Cosine Transform Frequency Loss (ADFL) to minimize the frequency gap between HR and ground-truth images.
During inference, we extend the receptive field to preserve spectral coherence between low-resolution (LR) and ground-truth images.
arXiv Detail & Related papers (2024-08-25T03:53:17Z) - Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution.
We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain.
Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z) - Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion [28.049668999586583]
We propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD.
CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process.
Our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression.
arXiv Detail & Related papers (2024-01-08T10:08:48Z) - Frequency-Aware Transformer for Learned Image Compression [64.28698450919647]
We propose a frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for Learned Image Compression (LIC)<n>The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images.<n>We also introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance.
arXiv Detail & Related papers (2023-10-25T05:59:25Z) - Gated Multi-Resolution Transfer Network for Burst Restoration and
Enhancement [75.25451566988565]
We propose a novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a spatially precise high-quality image from a burst of low-quality raw images.
Detailed experimental analysis on five datasets validates our approach and sets a state-of-the-art for burst super-resolution, burst denoising, and low-light burst enhancement.
arXiv Detail & Related papers (2023-04-13T17:54:00Z) - A Scale-Arbitrary Image Super-Resolution Network Using Frequency-domain
Information [42.55177009667711]
Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images.
In this paper, we study image features in the frequency domain to design a novel scale-arbitrary image SR network.
arXiv Detail & Related papers (2022-12-08T15:10:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.