Progressive Focused Transformer for Single Image Super-Resolution
- URL: http://arxiv.org/abs/2503.20337v1
- Date: Wed, 26 Mar 2025 09:02:37 GMT
- Title: Progressive Focused Transformer for Single Image Super-Resolution
- Authors: Wei Long, Xingyu Zhou, Leheng Zhang, Shuhang Gu,
- Abstract summary: We propose a novel and effective Progressive Focused Transformer (PFT) that links all isolated attention maps in the network through Progressive Focused Attention (PFA) to focus attention on the most important tokens.<n>PFA not only enables the network to capture more critical similar features, but also significantly reduces the computational cost of the overall network by filtering out irrelevant features before calculating similarities.
- Score: 21.301520456058544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based methods have achieved remarkable results in image super-resolution tasks because they can capture non-local dependencies in low-quality input images. However, this feature-intensive modeling approach is computationally expensive because it calculates the similarities between numerous features that are irrelevant to the query features when obtaining attention weights. These unnecessary similarity calculations not only degrade the reconstruction performance but also introduce significant computational overhead. How to accurately identify the features that are important to the current query features and avoid similarity calculations between irrelevant features remains an urgent problem. To address this issue, we propose a novel and effective Progressive Focused Transformer (PFT) that links all isolated attention maps in the network through Progressive Focused Attention (PFA) to focus attention on the most important tokens. PFA not only enables the network to capture more critical similar features, but also significantly reduces the computational cost of the overall network by filtering out irrelevant features before calculating similarities. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance on various single image super-resolution benchmarks.
Related papers
- Crafting Query-Aware Selective Attention for Single Image Super-Resolution [3.133812520659661]
Single Image Super-Resolution (SISR) reconstructs high-resolution images from low-resolution inputs, enhancing image details.
We propose SSCAN, which dynamically selects the most relevant key-value windows based on query similarity.
Our experiments demonstrate that SSCAN outperforms existing attention-based SISR methods, achieving up to 0.14 dB PSNR improvement on urban datasets.
arXiv Detail & Related papers (2025-04-09T07:17:29Z) - DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision.
The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z) - Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching [30.272791354494373]
We introduce affine-based local attention to model cross-view deformations.
We also present selective fusion to merge local and global messages from cross attention.
arXiv Detail & Related papers (2024-05-22T17:57:37Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - PRISTA-Net: Deep Iterative Shrinkage Thresholding Network for Coded
Diffraction Patterns Phase Retrieval [6.982256124089]
Phase retrieval is a challenge nonlinear inverse problem in computational imaging and image processing.
We have developed PRISTA-Net, a deep unfolding network based on the first-order iterative threshold threshold algorithm (ISTA)
All parameters in the proposed PRISTA-Net framework, including the nonlinear transformation, threshold, and step size, are learned-to-end instead of being set.
arXiv Detail & Related papers (2023-09-08T07:37:15Z) - Learning Image Deraining Transformer Network with Dynamic Dual
Self-Attention [46.11162082219387]
This paper proposes an effective image deraining Transformer with dynamic dual self-attention (DDSA)
Specifically, we only select the most useful similarity values based on top-k approximate calculation to achieve sparse attention.
In addition, we also develop a novel spatial-enhanced feed-forward network (SEFN) to further obtain a more accurate representation for achieving high-quality derained results.
arXiv Detail & Related papers (2023-08-15T13:59:47Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Learning A Sparse Transformer Network for Effective Image Deraining [42.01684644627124]
We propose an effective DeRaining network, Sparse Transformer (DRSformer)
We develop a learnable top-k selection operator to adaptively retain the most crucial attention scores from the keys for each query for better feature aggregation.
We equip our model with mixture of experts feature compensator to present a cooperation refinement deraining scheme.
arXiv Detail & Related papers (2023-03-21T15:41:57Z) - Skip-Attention: Improving Vision Transformers by Paying Less Attention [55.47058516775423]
Vision computation transformers (ViTs) use expensive self-attention operations in every layer.
We propose SkipAt, a method to reuse self-attention from preceding layers to approximate attention at one or more subsequent layers.
We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS.
arXiv Detail & Related papers (2023-01-05T18:59:52Z) - CiaoSR: Continuous Implicit Attention-in-Attention Network for
Arbitrary-Scale Image Super-Resolution [158.2282163651066]
This paper proposes a continuous implicit attention-in-attention network, called CiaoSR.
We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features.
We embed a scale-aware attention in this implicit attention network to exploit additional non-local information.
arXiv Detail & Related papers (2022-12-08T15:57:46Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.