Related papers: RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution

RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution

URL: http://arxiv.org/abs/2508.16158v1
Date: Fri, 22 Aug 2025 07:28:34 GMT
Title: RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution
Authors: Haodong He, Yancheng Bai, Rui Lan, Xu Duan, Lei Sun, Xiangxiang Chu, Gui-Song Xia,
Abstract summary: We propose a novel method to generate clear and accurate regional details in super-resolution images.<n>The method explicitly extracts localized fine-grained information and encodes it through a novel regional attention mechanism.<n> Experimental results on benchmark datasets demonstrate that our approach exhibits superior performance in generating perceptually authentic visual details.
Score: 38.794214985205045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rich textual information of large vision-language models (VLMs) combined with the powerful generative prior of pre-trained text-to-image (T2I) diffusion models has achieved impressive performance in single-image super-resolution (SISR). However, existing methods still face significant challenges in generating clear and accurate regional details, particularly in scenarios involving multiple objects. This challenge primarily stems from a lack of fine-grained regional descriptions and the models' insufficient ability to capture complex prompts. To address these limitations, we propose a Regional Attention Guided Super-Resolution (RAGSR) method that explicitly extracts localized fine-grained information and effectively encodes it through a novel regional attention mechanism, enabling both enhanced detail and overall visually coherent SR results. Specifically, RAGSR localizes object regions in an image and assigns fine-grained caption to each region, which are formatted as region-text pairs as textual priors for T2I models. A regional guided attention is then leveraged to ensure that each region-text pair is properly considered in the attention process while preventing unwanted interactions between unrelated region-text pairs. By leveraging this attention mechanism, our approach offers finer control over the integration of text and image information, thereby effectively overcoming limitations faced by traditional SISR techniques. Experimental results on benchmark datasets demonstrate that our approach exhibits superior performance in generating perceptually authentic visual details while maintaining contextual consistency compared to existing approaches.

Related papers

Cross Modal Fine-Grained Alignment via Granularity-Aware and Region-Uncertain Modeling [17.78769812974246]
Fine-grained image-text alignment is a pivotal challenge in multimodal learning.<n>We propose a unified approach that incorporates significance-aware and region-level uncertainty modeling.<n>Our approach achieves state-of-the-art performance across various backbone architectures.
arXiv Detail & Related papers (2025-11-11T00:28:11Z)
Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders [14.655107789528673]
We introduce a novel diffusion-based SR framework, namely TADiSR, which integrates text-aware attention and joint segmentation decoders.<n>We propose a complete pipeline for synthesizing high-quality images with fine-grained full-image text masks.<n>Our approach substantially enhances text legibility in super-resolved images, achieving state-of-the-art performance across multiple evaluation metrics.
arXiv Detail & Related papers (2025-06-05T05:23:10Z)
Creatively Upscaling Images with Global-Regional Priors [98.24171965992916]
C-Upscale is a new recipe of tuning-free image upscaling.<n>It pivots on global-regional priors derived from given global prompt and estimated regional prompts.<n>It generates ultra-high-resolution images with higher visual fidelity and more creative regional details.
arXiv Detail & Related papers (2025-05-22T17:51:50Z)
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection [20.630629383286262]
Open-vocabulary object detection requires solid modeling of the region-semantic relationship. We propose RTGen to generate scalable open-vocabulary region-text pairs.
arXiv Detail & Related papers (2024-05-30T09:03:23Z)
Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning [50.88504784466931]
Multi-task dense prediction involves semantic segmentation, depth estimation, and surface normal estimation. Existing solutions typically rely on learning global image representations for global cross-task image matching. Our proposal involves modeling region-wise representations using Gaussian Distributions.
arXiv Detail & Related papers (2024-03-15T12:41:30Z)
RegionGPT: Towards Region Understanding Vision Language Model [88.42271128373191]
RegionGPT (short as RGPT) is a novel framework designed for complex region-level captioning and understanding. We develop an automated region caption data generation pipeline, enriching the training set with detailed region-level captions. We demonstrate that a universal RGPT model can be effectively applied and significantly enhancing performance across a range of region-level tasks.
arXiv Detail & Related papers (2024-03-04T18:58:08Z)
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation [74.5598315066249]
We probe into zero-shot grounded T2I generation with diffusion models. We propose a Region and Boundary (R&B) aware cross-attention guidance approach.
arXiv Detail & Related papers (2023-10-13T05:48:42Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.