From Attention to Frequency: Integration of Vision Transformer and FFT-ReLU for Enhanced Image Deblurring
- URL: http://arxiv.org/abs/2511.10806v1
- Date: Thu, 13 Nov 2025 21:19:57 GMT
- Title: From Attention to Frequency: Integration of Vision Transformer and FFT-ReLU for Enhanced Image Deblurring
- Authors: Syed Mumtahin Mahmud, Mahdi Mohd Hossain Noki, Prothito Shovon Majumder, Abdul Mohaimen Al Radi, Md. Haider Ali, Md. Mosaddek Khan,
- Abstract summary: We propose a new dual-domain architecture that unifies Vision Transformers with a frequency-domain FFT-ReLU module.<n>In this structure, the ViT backbone captures local and global dependencies, while the FFT-ReLU component enforces frequency-domain sparsity to suppress blur-related artifacts.<n>Experiments on benchmark datasets demonstrate that this architecture achieves superior PSNR, SSIM, and perceptual quality compared to state-of-the-art models.
- Score: 0.9728664856449597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image deblurring is vital in computer vision, aiming to recover sharp images from blurry ones caused by motion or camera shake. While deep learning approaches such as CNNs and Vision Transformers (ViTs) have advanced this field, they often struggle with complex or high-resolution blur and computational demands. We propose a new dual-domain architecture that unifies Vision Transformers with a frequency-domain FFT-ReLU module, explicitly bridging spatial attention modeling and frequency sparsity. In this structure, the ViT backbone captures local and global dependencies, while the FFT-ReLU component enforces frequency-domain sparsity to suppress blur-related artifacts and preserve fine details. Extensive experiments on benchmark datasets demonstrate that this architecture achieves superior PSNR, SSIM, and perceptual quality compared to state-of-the-art models. Both quantitative metrics, qualitative comparisons, and human preference evaluations confirm its effectiveness, establishing a practical and generalizable paradigm for real-world image restoration.
Related papers
- Frequency-Domain Fusion Transformer for Image Inpainting [6.4194162137514725]
This paper proposes a Transformer-based image inpainting method incorporating frequency-domain fusion.<n> Experimental results demonstrate that the proposed method effectively improves the quality of image inpainting by preserving more high-frequency information.
arXiv Detail & Related papers (2025-06-23T09:19:04Z) - F2T2-HiT: A U-Shaped FFT Transformer and Hierarchical Transformer for Reflection Removal [16.539156634006236]
Single Image Reflection Removal (SIRR) technique plays a crucial role in image processing by eliminating unwanted reflections from the background.<n>These reflections, often caused by photographs taken through glass surfaces, can significantly degrade image quality.<n>This paper introduces a U-shaped Fast Fourier Transform Transformer and Hierarchical Transformer architecture, an innovative Transformer-based design for SIRR.
arXiv Detail & Related papers (2025-06-05T18:12:36Z) - Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z) - Frequency-Aware Vision Transformers for High-Fidelity Super-Resolution of Earth System Models [0.0]
Super-resolution methods tend to exhibit spectral bias, reconstructing low-frequency content more readily than valuable high-frequency details.<n>We introduce two frequency-aware frameworks: the Vision Transformer-Tuned Sinusoidal Implicit Representation (ViSIR) and the Vision Transformer Fourier Representation Network (ViFOR)<n>The results establish ViFOR as a state-of-the-art, scalable solution for climate data downscaling.
arXiv Detail & Related papers (2025-02-18T01:52:41Z) - Efficient Visual State Space Model for Image Deblurring [99.54894198086852]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.<n>We propose a simple yet effective visual state space model (EVSSM) for image deblurring.<n>The proposed EVSSM performs favorably against state-of-the-art methods on benchmark datasets and real-world images.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution [6.367865391518726]
Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR)
To activate more input pixels globally, hybrid attention models have been proposed.
We employ wavelet losses to train Transformer models to improve quantitative and subjective performance.
arXiv Detail & Related papers (2024-04-17T11:25:19Z) - Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction [18.014481087171657]
The correction of exposure-related issues is a pivotal component in enhancing the quality of images.
This paper proposes a novel methodology that leverages the frequency domain to improve and unify the handling of exposure correction tasks.
Our proposed method achieves state-of-the-art results, paving the way for more sophisticated and unified solutions in exposure correction.
arXiv Detail & Related papers (2023-09-03T14:09:14Z) - RBSR: Efficient and Flexible Recurrent Network for Burst
Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images.
In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z) - Blur Interpolation Transformer for Real-World Motion from Blur [52.10523711510876]
We propose a encoded blur transformer (BiT) to unravel the underlying temporal correlation in blur.
Based on multi-scale residual Swin transformer blocks, we introduce dual-end temporal supervision and temporally symmetric ensembling strategies.
In addition, we design a hybrid camera system to collect the first real-world dataset of one-to-many blur-sharp video pairs.
arXiv Detail & Related papers (2022-11-21T13:10:10Z) - Contextual Learning in Fourier Complex Field for VHR Remote Sensing
Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels)
We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA.
By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z) - Frequency Consistent Adaptation for Real World Super Resolution [64.91914552787668]
We propose a novel Frequency Consistent Adaptation (FCA) that ensures the frequency domain consistency when applying Super-Resolution (SR) methods to the real scene.
We estimate degradation kernels from unsupervised images and generate the corresponding Low-Resolution (LR) images.
Based on the domain-consistent LR-HR pairs, we train easy-implemented Convolutional Neural Network (CNN) SR models.
arXiv Detail & Related papers (2020-12-18T08:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.