Related papers: DSwinIR: Rethinking Window-based Attention for Image Restoration

DSwinIR: Rethinking Window-based Attention for Image Restoration

URL: http://arxiv.org/abs/2504.04869v2
Date: Sun, 27 Jul 2025 07:45:59 GMT
Title: DSwinIR: Rethinking Window-based Attention for Image Restoration
Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu, Liqiang Nie,
Abstract summary: We propose the Deformable Sliding Window Transformer (DSwinIR) as a new foundational backbone architecture for image restoration.<n>At the heart of DSwinIR is the proposed novel Deformable Sliding Window (DSwin) Attention.<n>Extensive experiments show that DSwinIR sets a new state-of-the-art across a wide spectrum of image restoration tasks.
Score: 109.38288333994407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image restoration has witnessed significant advancements with the development of deep learning models. Especially Transformer-based models, particularly those leveraging window-based self-attention, have become a dominant force in image restoration. However, their performance is fundamentally constrained by the rigid, non-overlapping window partitioning scheme, which leads to two critical limitations: insufficient feature interaction across window boundaries and content-agnostic receptive fields that cannot adapt to diverse image structures. Existing methods often rely on heuristic patterns to mitigate these issues, rather than addressing the root cause. In this paper, we propose the Deformable Sliding Window Transformer (DSwinIR), a new foundational backbone architecture that systematically overcomes these limitations. At the heart of DSwinIR is the proposed novel Deformable Sliding Window (DSwin) Attention. This mechanism introduces two fundamental innovations. First, it replaces the rigid partitioning with a token-centric sliding window paradigm, ensuring seamless cross-window information flow and effectively eliminating boundary artifacts. Second, it incorporates a content-aware deformable sampling strategy, which allows the attention mechanism to learn data-dependent offsets and dynamically shape its receptive fields to focus on the most informative image regions. This synthesis endows the model with both strong locality-aware inductive biases and powerful, adaptive long-range modeling capabilities. Extensive experiments show that DSwinIR sets a new state-of-the-art across a wide spectrum of image restoration tasks. For instance, in all-in-one restoration, our DSwinIR surpasses the most recent backbone GridFormer by over 0.53 dB on the three-task benchmark and a remarkable 0.86 dB on the five-task benchmark.

Related papers

UnfoldIR: Rethinking Deep Unfolding Network in Illumination Degradation Image Restoration [33.290565892897824]
Deep unfolding networks (DUNs) are widely employed in illumination degradation image restoration (IDIR)<n>We propose a novel DUN-based method, UnfoldIR, for IDIR tasks.<n>We unfold the iterative optimized solution of this model into a multistage network, with each stage comprising a reflectance-assisted illumination correction (RAIC) module and an illumination-guided reflectance enhancement (IGRE) module.
arXiv Detail & Related papers (2025-05-10T16:13:01Z)
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior [56.35236964617809]
Image restoration aims to recover content from inputs degraded by various factors, such as adverse weather, blur, and noise. This paper introduces UniRestore, a unified image restoration model that bridges the gap between PIR and TIR. We propose a Complementary Feature Restoration Module (CFRM) to reconstruct degraded encoder features and a Task Feature Adapter (TFA) module to facilitate adaptive feature fusion in the decoder.
arXiv Detail & Related papers (2025-01-22T08:06:48Z)
Joint multi-dimensional dynamic attention and transformer for general image restoration [14.987034136856463]
outdoor images often suffer from severe degradation due to rain, haze, and noise. Current image restoration methods struggle to handle complex degradation while maintaining efficiency. This paper introduces a novel image restoration architecture that combines multi-dimensional dynamic attention and self-attention.
arXiv Detail & Related papers (2024-11-12T15:58:09Z)
ARNet: Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space.<n>We propose an effective approach to narrow the gap between the two domains.<n>It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z)
Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration [46.96362010335177]
In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration. Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images. In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM.
arXiv Detail & Related papers (2024-03-30T08:05:00Z)
CFAT: Unleashing TriangularWindows for Image Super-resolution [5.130320840059732]
Transformer-based models have revolutionized the field of image super-resolution (SR) We propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
arXiv Detail & Related papers (2024-03-24T13:31:31Z)
How Powerful Potential of Attention on Image Restoration? [97.9777639562205]
We conduct an empirical study to explore the potential of attention mechanisms without using FFN. We propose Continuous Scaling Attention (textbfCSAttn), a method that computes attention continuously in three stages without using FFN. Our designs provide a closer look at the attention mechanism and reveal that some simple operations can significantly affect the model performance.
arXiv Detail & Related papers (2024-03-15T14:23:12Z)
Multi-task Image Restoration Guided By Robust DINO Features [88.74005987908443]
We propose mboxtextbfDINO-IR, a multi-task image restoration approach leveraging robust features extracted from DINOv2. We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features. By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training.
arXiv Detail & Related papers (2023-12-04T06:59:55Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration. Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z)
Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video Restoration [78.14941737723501]
We propose a Cross-consistent Deep Unfolding Network (CDUN) for All-In-One VR. By orchestrating two cascading procedures, CDUN achieves adaptive processing for diverse degradations. In addition, we introduce a window-based inter-frame fusion strategy to utilize information from more adjacent frames.
arXiv Detail & Related papers (2023-09-04T14:18:00Z)
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [70.46245698746874]
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results. For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details.
arXiv Detail & Related papers (2023-08-29T07:11:52Z)
Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data. Transformers have shown significant performance gains on natural language and high-level vision tasks. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z)
Diverse Image Inpainting with Bidirectional and Autoregressive Transformers [55.21000775547243]
We propose BAT-Fill, an image inpainting framework with a novel bidirectional autoregressive transformer (BAT) BAT-Fill inherits the merits of transformers and CNNs in a two-stage manner, which allows to generate high-resolution contents without being constrained by the quadratic complexity of attention in transformers.
arXiv Detail & Related papers (2021-04-26T03:52:27Z)
Deep Variational Network Toward Blind Image Restoration [60.45350399661175]
Blind image restoration is a common yet challenging problem in computer vision. We propose a novel blind image restoration method, aiming to integrate both the advantages of them. Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.
arXiv Detail & Related papers (2020-08-25T03:30:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.