DiNAT-IR: Exploring Dilated Neighborhood Attention for High-Quality Image Restoration
- URL: http://arxiv.org/abs/2507.17892v1
- Date: Wed, 23 Jul 2025 19:41:49 GMT
- Title: DiNAT-IR: Exploring Dilated Neighborhood Attention for High-Quality Image Restoration
- Authors: Hanzhou Liu, Binghan Li, Chengkai Liu, Mi Lu,
- Abstract summary: We introduce Dilated Neighborhood Attention (DiNA) as a promising alternative to Transformers.<n>DiNA balances global context and local precision by integrating sliding-window attention with mixed dilation factors.<n>We introduce a channel-aware module that complements local attention, effectively integrating global context without sacrificing pixel-level precision.
- Score: 1.5124439914522694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers, with their self-attention mechanisms for modeling long-range dependencies, have become a dominant paradigm in image restoration tasks. However, the high computational cost of self-attention limits scalability to high-resolution images, making efficiency-quality trade-offs a key research focus. To address this, Restormer employs channel-wise self-attention, which computes attention across channels instead of spatial dimensions. While effective, this approach may overlook localized artifacts that are crucial for high-quality image restoration. To bridge this gap, we explore Dilated Neighborhood Attention (DiNA) as a promising alternative, inspired by its success in high-level vision tasks. DiNA balances global context and local precision by integrating sliding-window attention with mixed dilation factors, effectively expanding the receptive field without excessive overhead. However, our preliminary experiments indicate that directly applying this global-local design to the classic deblurring task hinders accurate visual restoration, primarily due to the constrained global context understanding within local attention. To address this, we introduce a channel-aware module that complements local attention, effectively integrating global context without sacrificing pixel-level precision. The proposed DiNAT-IR, a Transformer-based architecture specifically designed for image restoration, achieves competitive results across multiple benchmarks, offering a high-quality solution for diverse low-level computer vision problems.
Related papers
- Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention [54.42902794496325]
Linear attention, a variant of softmax attention, demonstrates promise in global context modeling.<n>We propose Rank Enhanced Linear Attention (RELA), a simple yet effective method that enriches feature representations by integrating a lightweight depthwise convolution.<n>Building upon RELA, we propose an efficient and effective image restoration Transformer, named LAformer.
arXiv Detail & Related papers (2025-05-22T02:57:23Z) - Efficient Concertormer for Image Deblurring and Beyond [87.07963453448328]
We introduce a novel Concerto Self-Attention (CSA) mechanism designed for image deblurring.<n>By retaining partial information in additional dimensions independent from the self-attention calculations, our method effectively captures global contextual representations with complexity linear to the image size.<n>While our primary objective is single-image motion deblurring, extensive quantitative and qualitative evaluations demonstrate that our approach performs favorably against state-of-the-art methods in other tasks.
arXiv Detail & Related papers (2024-04-09T09:02:21Z) - CascadedGaze: Efficiency in Global Context Extraction for Image Restoration [12.967835674413596]
We present CascadedGaze Network (CGNet), an encoder-decoder architecture that employs Global Context Extractor (GCE)
The GCE module leverages small kernels across convolutional layers to learn global dependencies, without requiring self-attention.
arXiv Detail & Related papers (2024-01-26T22:59:51Z) - Interpreting and Improving Attention From the Perspective of Large Kernel Convolution [51.06461246235176]
We introduce Large Kernel Convolutional Attention (LKCA), a novel formulation that reinterprets attention operations as a single large- Kernel convolution.<n>LKCA achieves competitive performance across various visual tasks, particularly in data-constrained settings.
arXiv Detail & Related papers (2024-01-11T08:40:35Z) - Recursive Generalization Transformer for Image Super-Resolution [108.67898547357127]
We propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images.
We combine the RG-SA with local self-attention to enhance the exploitation of the global context.
Our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-11T10:44:44Z) - DALG: Deep Attentive Local and Global Modeling for Image Retrieval [26.773211032906854]
We propose a fully attention based framework for robust representation learning motivated by the success of Transformer.
Besides applying Transformer for global feature extraction, we devise a local branch composed of window-based multi-head attention and spatial attention.
With our Deep Attentive Local and Global modeling framework (DALG), extensive experimental results show that efficiency can be significantly improved.
arXiv Detail & Related papers (2022-07-01T09:32:15Z) - Memory-augmented Deep Unfolding Network for Guided Image
Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image.
Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image.
We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z) - COLA-Net: Collaborative Attention Network for Image Restoration [27.965025010397603]
We propose a novel collaborative attention network (COLA-Net) for image restoration.
Our proposed COLA-Net is able to achieve state-of-the-art performance in both peak signal-to-noise ratio and visual perception.
arXiv Detail & Related papers (2021-03-10T09:33:17Z) - Image Super-Resolution with Cross-Scale Non-Local Attention and
Exhaustive Self-Exemplars Mining [66.82470461139376]
We propose the first Cross-Scale Non-Local (CS-NL) attention module with integration into a recurrent neural network.
By combining the new CS-NL prior with local and in-scale non-local priors in a powerful recurrent fusion cell, we can find more cross-scale feature correlations within a single low-resolution image.
arXiv Detail & Related papers (2020-06-02T07:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.