Open-Vocabulary Remote Sensing Image Semantic Segmentation
- URL: http://arxiv.org/abs/2409.07683v1
- Date: Thu, 12 Sep 2024 01:16:25 GMT
- Title: Open-Vocabulary Remote Sensing Image Semantic Segmentation
- Authors: Qinglong Cao, Yuntian Chen, Chao Ma, Xiaokang Yang,
- Abstract summary: Open-vocabulary image semantic segmentation (OVS) seeks to segment images into semantic regions across an open set of categories.
These approaches are predominantly tailored to natural images and struggle with the unique characteristics of remote sensing images.
We propose the first OVS framework specifically designed for remote sensing imagery.
- Score: 48.70312596065687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-vocabulary image semantic segmentation (OVS) seeks to segment images into semantic regions across an open set of categories. Existing OVS methods commonly depend on foundational vision-language models and utilize similarity computation to tackle OVS tasks. However, these approaches are predominantly tailored to natural images and struggle with the unique characteristics of remote sensing images, such as rapidly changing orientations and significant scale variations. These challenges complicate OVS tasks in earth vision, requiring specialized approaches. To tackle this dilemma, we propose the first OVS framework specifically designed for remote sensing imagery, drawing inspiration from the distinct remote sensing traits. Particularly, to address the varying orientations, we introduce a rotation-aggregative similarity computation module that generates orientation-adaptive similarity maps as initial semantic maps. These maps are subsequently refined at both spatial and categorical levels to produce more accurate semantic maps. Additionally, to manage significant scale changes, we integrate multi-scale image features into the upsampling process, resulting in the final scale-aware semantic masks. To advance OVS in earth vision and encourage reproducible research, we establish the first open-sourced OVS benchmark for remote sensing imagery, including four public remote sensing datasets. Extensive experiments on this benchmark demonstrate our proposed method achieves state-of-the-art performance. All codes and datasets are available at https://github.com/caoql98/OVRS.
Related papers
- AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images [21.294581646546124]
AerOSeg is a novel Open-Vocabulary (OVS) approach for remote sensing data.
We compute robust image-text correlation features using rotated versions of the input image and domain-specific prompts.
Inspired by the success of the Segment Anything Model (SAM) in diverse domains, we leverage SAM features to guide the spatial refinement of correlation features.
We enhance the refined correlation features using a multi-scale attention-aware composition to produce the final segmentation map.
arXiv Detail & Related papers (2025-04-12T13:06:46Z) - A Recipe for Improving Remote Sensing VLM Zero Shot Generalization [0.4427533728730559]
We present two novel image-caption datasets for training of remote sensing foundation models.
The first dataset pairs aerial and satellite imagery with captions generated by Gemini using landmarks extracted from Google Maps.
The second dataset utilizes public web images and their corresponding alt-text, filtered for the remote sensing domain.
arXiv Detail & Related papers (2025-03-10T21:09:02Z) - Context and Geometry Aware Voxel Transformer for Semantic Scene Completion [7.147020285382786]
Vision-based Semantic Scene Completion (SSC) has gained much attention due to its widespread applications in various 3D perception tasks.
Existing sparse-to-dense approaches typically employ shared context-independent queries across various input images.
We introduce a neural network named CGFormer to achieve semantic scene completion.
arXiv Detail & Related papers (2024-05-22T14:16:30Z) - A Transformer-Based Adaptive Semantic Aggregation Method for UAV Visual
Geo-Localization [2.1462492411694756]
This paper addresses the task of Unmanned Aerial Vehicles (UAV) visual geo-localization.
Part matching is crucial for UAV visual geo-localization since part-level representations can capture image details and help to understand the semantic information of scenes.
We introduce a transformer-based adaptive semantic aggregation method that regards parts as the most representative semantics in an image.
arXiv Detail & Related papers (2024-01-03T06:58:52Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Exploring Invariant Representation for Visible-Infrared Person
Re-Identification [77.06940947765406]
Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy.
In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM)
Experiment results on two standard cross-spectral person re-identification datasets, RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.
arXiv Detail & Related papers (2023-02-02T05:24:50Z) - Self-Training Guided Disentangled Adaptation for Cross-Domain Remote
Sensing Image Semantic Segmentation [20.07907723950031]
We propose a self-training guided disentangled adaptation network (ST-DASegNet) for cross-domain RS image semantic segmentation task.
We first propose source student backbone and target student backbone to respectively extract the source-style and target-style feature for both source and target images.
We then propose a domain disentangled module to extract the universal feature and purify the distinct feature of source-style and target-style features.
arXiv Detail & Related papers (2023-01-13T13:11:22Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Mutual Affine Network for Spatially Variant Kernel Estimation in Blind
Image Super-Resolution [130.32026819172256]
Existing blind image super-resolution (SR) methods mostly assume blur kernels are spatially invariant across the whole image.
This paper proposes a mutual affine network (MANet) for spatially variant kernel estimation.
arXiv Detail & Related papers (2021-08-11T16:11:17Z) - Bidirectional Multi-scale Attention Networks for Semantic Segmentation
of Oblique UAV Imagery [30.524771772192757]
We propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction.
Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.
arXiv Detail & Related papers (2021-02-05T11:02:15Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.