From Local Windows to Adaptive Candidates via Individualized Exploratory: Rethinking Attention for Image Super-Resolution
- URL: http://arxiv.org/abs/2601.08341v1
- Date: Tue, 13 Jan 2026 09:01:20 GMT
- Title: From Local Windows to Adaptive Candidates via Individualized Exploratory: Rethinking Attention for Image Super-Resolution
- Authors: Chunyu Meng, Wei Long, Shuhang Gu,
- Abstract summary: Single Image Super-Resolution (SISR) is a fundamental computer vision task that aims to reconstruct a high-resolution (HR) image from a low-resolution (LR) input.<n>We propose the Individualized Exploratory Transformer (IET) to enable flexible and token-adaptive attention computation.
- Score: 20.444907448992154
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Single Image Super-Resolution (SISR) is a fundamental computer vision task that aims to reconstruct a high-resolution (HR) image from a low-resolution (LR) input. Transformer-based methods have achieved remarkable performance by modeling long-range dependencies in degraded images. However, their feature-intensive attention computation incurs high computational cost. To improve efficiency, most existing approaches partition images into fixed groups and restrict attention within each group. Such group-wise attention overlooks the inherent asymmetry in token similarities, thereby failing to enable flexible and token-adaptive attention computation. To address this limitation, we propose the Individualized Exploratory Transformer (IET), which introduces a novel Individualized Exploratory Attention (IEA) mechanism that allows each token to adaptively select its own content-aware and independent attention candidates. This token-adaptive and asymmetric design enables more precise information aggregation while maintaining computational efficiency. Extensive experiments on standard SR benchmarks demonstrate that IET achieves state-of-the-art performance under comparable computational complexity.
Related papers
- Rotation Equivariant Arbitrary-scale Image Super-Resolution [62.41329042683779]
The arbitrary-scale image super-resolution (ASISR) aims to achieve arbitrary-scale high-resolution recoveries from a low-resolution input image.<n>We make efforts to construct a rotation equivariant ASISR method in this study.
arXiv Detail & Related papers (2025-08-07T08:51:03Z) - IM-LUT: Interpolation Mixing Look-Up Tables for Image Super-Resolution [21.982964666527646]
Look-up table (LUT)-based approaches have attracted interest due to their efficiency and performance.<n>Existing ASISR techniques often employ implicit neural representations, which come with considerable computational cost and memory demands.<n>We propose Interpolation Mixing LUT (IM-LUT), a novel framework that operates ASISR by learning to blend multiple functions to maximize their capacity.
arXiv Detail & Related papers (2025-07-14T05:02:57Z) - Crafting Query-Aware Selective Attention for Single Image Super-Resolution [3.133812520659661]
Single Image Super-Resolution (SISR) reconstructs high-resolution images from low-resolution inputs, enhancing image details.<n>We propose SSCAN, which dynamically selects the most relevant key-value windows based on query similarity.<n>Our experiments demonstrate that SSCAN outperforms existing attention-based SISR methods, achieving up to 0.14 dB PSNR improvement on urban datasets.
arXiv Detail & Related papers (2025-04-09T07:17:29Z) - Progressive Focused Transformer for Single Image Super-Resolution [21.301520456058544]
We propose a novel and effective Progressive Focused Transformer (PFT) that links all isolated attention maps in the network through Progressive Focused Attention (PFA) to focus attention on the most important tokens.<n>PFA not only enables the network to capture more critical similar features, but also significantly reduces the computational cost of the overall network by filtering out irrelevant features before calculating similarities.
arXiv Detail & Related papers (2025-03-26T09:02:37Z) - Efficient Visual State Space Model for Image Deblurring [99.54894198086852]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.<n>We propose a simple yet effective visual state space model (EVSSM) for image deblurring.<n>The proposed EVSSM performs favorably against state-of-the-art methods on benchmark datasets and real-world images.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Transcending the Limit of Local Window: Advanced Super-Resolution
Transformer with Adaptive Token Dictionary [30.506135273928596]
Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones.
We introduce a group of auxiliary Adaptive Token Dictionary to SR Transformer and establish an ATD-SR method.
Our method achieves the best performance on various single image super-resolution benchmarks.
arXiv Detail & Related papers (2024-01-16T08:50:44Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Image-specific Convolutional Kernel Modulation for Single Image
Super-resolution [85.09413241502209]
In this issue, we propose a novel image-specific convolutional modulation kernel (IKM)
We exploit the global contextual information of image or feature to generate an attention weight for adaptively modulating the convolutional kernels.
Experiments on single image super-resolution show that the proposed methods achieve superior performances over state-of-the-art methods.
arXiv Detail & Related papers (2021-11-16T11:05:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.