SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model
- URL: http://arxiv.org/abs/2505.23010v1
- Date: Thu, 29 May 2025 02:38:34 GMT
- Title: SeG-SR: Integrating Semantic Knowledge into Remote Sensing Image Super-Resolution via Vision-Language Model
- Authors: Bowen Chen, Keyan Chen, Mohan Yang, Zhengxia Zou, Zhenwei Shi,
- Abstract summary: High-resolution (HR) remote sensing imagery plays a vital role in a wide range of applications, including urban planning and environmental monitoring.<n>Due to limitations in sensors and data transmission links, the images acquired in practice often suffer from resolution degradation.<n>Remote Sensing Image Super-Resolution (RSISR) aims to reconstruct HR images from low-resolution (LR) inputs, providing a cost-effective and efficient alternative to direct HR image acquisition.
- Score: 23.383837540690823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-resolution (HR) remote sensing imagery plays a vital role in a wide range of applications, including urban planning and environmental monitoring. However, due to limitations in sensors and data transmission links, the images acquired in practice often suffer from resolution degradation. Remote Sensing Image Super-Resolution (RSISR) aims to reconstruct HR images from low-resolution (LR) inputs, providing a cost-effective and efficient alternative to direct HR image acquisition. Existing RSISR methods primarily focus on low-level characteristics in pixel space, while neglecting the high-level understanding of remote sensing scenes. This may lead to semantically inconsistent artifacts in the reconstructed results. Motivated by this observation, our work aims to explore the role of high-level semantic knowledge in improving RSISR performance. We propose a Semantic-Guided Super-Resolution framework, SeG-SR, which leverages Vision-Language Models (VLMs) to extract semantic knowledge from input images and uses it to guide the super resolution (SR) process. Specifically, we first design a Semantic Feature Extraction Module (SFEM) that utilizes a pretrained VLM to extract semantic knowledge from remote sensing images. Next, we propose a Semantic Localization Module (SLM), which derives a series of semantic guidance from the extracted semantic knowledge. Finally, we develop a Learnable Modulation Module (LMM) that uses semantic guidance to modulate the features extracted by the SR network, effectively incorporating high-level scene understanding into the SR pipeline. We validate the effectiveness and generalizability of SeG-SR through extensive experiments: SeG-SR achieves state-of-the-art performance on two datasets and consistently delivers performance improvements across various SR architectures. Codes can be found at https://github.com/Mr-Bamboo/SeG-SR.
Related papers
- HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation [74.1872891313184]
HRSeg is an efficient model with high-resolution fine-grained perception.<n>It features two key innovations: High-Resolution Perception (HRP) and High-Resolution Enhancement (HRE)
arXiv Detail & Related papers (2025-07-17T08:09:31Z) - GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing [5.653111274028541]
We propose GeoMag, an end-to-end general-purpose large model framework for remote sensing.<n>GeoMag focuses the attention scope based on prompt semantics to effectively perform remote sensing image parsing.<n>This approach improves the model's perception of critical target regions, suppresses background redundancy, and reduces the computational cost of interpreting high-resolution RS imagery.
arXiv Detail & Related papers (2025-07-08T11:21:03Z) - Controllable Reference-Based Real-World Remote Sensing Image Super-Resolution with Generative Diffusion Priors [13.148815217684277]
Super-resolution (SR) techniques can enhance the spatial resolution of remote sensing images by utilizing low-resolution (LR) images to reconstruct high-resolution (HR) images.<n>Existing RefSR methods struggle with real-world complexities, such as cross-sensor resolution gap and significant land cover changes.<n>We propose CRefDiff, a novel controllable reference-based diffusion model for real-world remote sensing image SR.
arXiv Detail & Related papers (2025-06-30T12:45:28Z) - DiffRIS: Enhancing Referring Remote Sensing Image Segmentation with Pre-trained Text-to-Image Diffusion Models [9.109484087832058]
DiffRIS is a novel framework that harnesses the semantic understanding capabilities of pre-trained text-to-image diffusion models for RRSIS tasks.<n>Our framework introduces two key innovations: a context perception adapter (CP-adapter) and a cross-modal reasoning decoder (PCMRD)
arXiv Detail & Related papers (2025-06-23T02:38:56Z) - ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG [33.19843463374473]
ImageRAG is a training-free framework to address the complexities of analyzing UHR remote sensing imagery.<n>ImageRAG's core innovation lies in its ability to selectively retrieve and focus on the most relevant portions of the UHR image as visual contexts.
arXiv Detail & Related papers (2024-11-12T10:12:12Z) - Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior [13.148815217684277]
Large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit.
Existing methods confront challenges in recovering SR images with clear textures and correct ground objects.
We introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution.
arXiv Detail & Related papers (2024-05-11T16:06:16Z) - RS-Mamba for Large Remote Sensing Image Dense Prediction [58.12667617617306]
We propose the Remote Sensing Mamba (RSM) for dense prediction tasks in large VHR remote sensing images.
RSM is specifically designed to capture the global context of remote sensing images with linear complexity.
Our model achieves better efficiency and accuracy than transformer-based models on large remote sensing images.
arXiv Detail & Related papers (2024-04-03T12:06:01Z) - Bridging the Domain Gap: A Simple Domain Matching Method for
Reference-based Image Super-Resolution in Remote Sensing [8.36527949191506]
Recently, reference-based image super-resolution (RefSR) has shown excellent performance in image super-resolution (SR) tasks.
We introduce a Domain Matching (DM) module that can be seamlessly integrated with existing RefSR models.
Our analysis reveals that their domain gaps often occur in different satellites, and our model effectively addresses these challenges.
arXiv Detail & Related papers (2024-01-29T08:10:00Z) - CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images.
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding.
To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z) - Learning Detail-Structure Alternative Optimization for Blind
Super-Resolution [69.11604249813304]
We propose an effective and kernel-free network, namely DSSR, which enables recurrent detail-structure alternative optimization without blur kernel prior incorporation for blind SR.
In our DSSR, a detail-structure modulation module (DSMM) is built to exploit the interaction and collaboration of image details and structures.
Our method achieves the state-of-the-art against existing methods.
arXiv Detail & Related papers (2022-12-03T14:44:17Z) - RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive
Feature Alignment and Selection [66.08293086254851]
We propose a reciprocal learning framework to reinforce the learning of a RefSR network.
The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection.
We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm.
arXiv Detail & Related papers (2022-11-08T12:39:35Z) - Memory-augmented Deep Unfolding Network for Guided Image
Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image.
Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image.
We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z) - DDet: Dual-path Dynamic Enhancement Network for Real-World Image
Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image.
In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR.
Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.