Related papers: PromptSR: Cascade Prompting for Lightweight Image Super-Resolution

PromptSR: Cascade Prompting for Lightweight Image Super-Resolution

URL: http://arxiv.org/abs/2507.04118v1
Date: Sat, 05 Jul 2025 17:56:45 GMT
Title: PromptSR: Cascade Prompting for Lightweight Image Super-Resolution
Authors: Wenyang Liu, Chen Cai, Jianjun Gao, Kejun Wu, Yi Wang, Kim-Hui Yap, Lap-Pui Chau,
Abstract summary: Vision Transformer has significantly advanced image super-resolution (SR)<n>It faces the inherent challenge of a limited receptive field due to the window-based self-attention modeling.<n>We propose PromptSR, a novel prompt-empowered lightweight image SR method.
Score: 20.796302187697364
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although the lightweight Vision Transformer has significantly advanced image super-resolution (SR), it faces the inherent challenge of a limited receptive field due to the window-based self-attention modeling. The quadratic computational complexity relative to window size restricts its ability to use a large window size for expanding the receptive field while maintaining low computational costs. To address this challenge, we propose PromptSR, a novel prompt-empowered lightweight image SR method. The core component is the proposed cascade prompting block (CPB), which enhances global information access and local refinement via three cascaded prompting layers: a global anchor prompting layer (GAPL) and two local prompting layers (LPLs). The GAPL leverages downscaled features as anchors to construct low-dimensional anchor prompts (APs) through cross-scale attention, significantly reducing computational costs. These APs, with enhanced global perception, are then used to provide global prompts, efficiently facilitating long-range token connections. The two LPLs subsequently combine category-based self-attention and window-based self-attention to refine the representation in a coarse-to-fine manner. They leverage attention maps from the GAPL as additional global prompts, enabling them to perceive features globally at different granularities for adaptive local refinement. In this way, the proposed CPB effectively combines global priors and local details, significantly enlarging the receptive field while maintaining the low computational costs of our PromptSR. The experimental results demonstrate the superiority of our method, which outperforms state-of-the-art lightweight SR methods in quantitative, qualitative, and complexity evaluations. Our code will be released at https://github.com/wenyang001/PromptSR.

Related papers

Breaking Complexity Barriers: High-Resolution Image Restoration with Rank Enhanced Linear Attention [54.42902794496325]
Linear attention, a variant of softmax attention, demonstrates promise in global context modeling.<n>We propose Rank Enhanced Linear Attention (RELA), a simple yet effective method that enriches feature representations by integrating a lightweight depthwise convolution.<n>Building upon RELA, we propose an efficient and effective image restoration Transformer, named LAformer.
arXiv Detail & Related papers (2025-05-22T02:57:23Z)
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution [42.76046559103463]
Transformer-based methods have demonstrated impressive performance in low-level visual tasks such as Image Super-Resolution (SR)<n>These methods limit attention to content-agnostic local regions, limiting directly the ability of attention to capture long-range dependency.<n>We propose a lightweight Content-Aware Token Aggregation Network (CATANet) to address these issues.<n>Our method achieves superior performance, with a maximum PSNR improvement of 0.33dB and nearly double the inference speed.
arXiv Detail & Related papers (2025-03-10T04:00:27Z)
A Lightweight and Effective Image Tampering Localization Network with Vision Mamba [5.369780585789917]
Current image tampering localization methods rely on Convolutional Neural Networks (CNNs) and Transformers.<n>We propose a lightweight and effective FORensic network based on vision MAmba (ForMa) for blind image tampering localization.
arXiv Detail & Related papers (2025-02-14T06:35:44Z)
HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts [77.62320553269615]
HiPrompt is a tuning-free solution for higher-resolution image generation. hierarchical prompts offer both global and local guidance. generated images maintain coherent local and global semantics, structures, and textures with high definition.
arXiv Detail & Related papers (2024-09-04T17:58:08Z)
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution [70.52256118833583]
We present a strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR) Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes.
arXiv Detail & Related papers (2024-07-08T12:42:10Z)
Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR. Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z)
Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
Recursive Generalization Transformer for Image Super-Resolution [108.67898547357127]
We propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images. We combine the RG-SA with local self-attention to enhance the exploitation of the global context. Our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively.
arXiv Detail & Related papers (2023-03-11T10:44:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.