AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images
- URL: http://arxiv.org/abs/2504.09203v1
- Date: Sat, 12 Apr 2025 13:06:46 GMT
- Title: AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images
- Authors: Saikat Dutta, Akhil Vasim, Siddhant Gole, Hamid Rezatofighi, Biplab Banerjee,
- Abstract summary: AerOSeg is a novel Open-Vocabulary (OVS) approach for remote sensing data.<n>We compute robust image-text correlation features using rotated versions of the input image and domain-specific prompts.<n>Inspired by the success of the Segment Anything Model (SAM) in diverse domains, we leverage SAM features to guide the spatial refinement of correlation features.<n>We enhance the refined correlation features using a multi-scale attention-aware composition to produce the final segmentation map.
- Score: 21.294581646546124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image segmentation beyond predefined categories is a key challenge in remote sensing, where novel and unseen classes often emerge during inference. Open-vocabulary image Segmentation addresses these generalization issues in traditional supervised segmentation models while reducing reliance on extensive per-pixel annotations, which are both expensive and labor-intensive to obtain. Most Open-Vocabulary Segmentation (OVS) methods are designed for natural images but struggle with remote sensing data due to scale variations, orientation changes, and complex scene compositions. This necessitates the development of OVS approaches specifically tailored for remote sensing. In this context, we propose AerOSeg, a novel OVS approach for remote sensing data. First, we compute robust image-text correlation features using multiple rotated versions of the input image and domain-specific prompts. These features are then refined through spatial and class refinement blocks. Inspired by the success of the Segment Anything Model (SAM) in diverse domains, we leverage SAM features to guide the spatial refinement of correlation features. Additionally, we introduce a semantic back-projection module and loss to ensure the seamless propagation of SAM's semantic information throughout the segmentation pipeline. Finally, we enhance the refined correlation features using a multi-scale attention-aware decoder to produce the final segmentation map. We validate our SAM-guided Open-Vocabulary Remote Sensing Segmentation model on three benchmark remote sensing datasets: iSAID, DLRSD, and OpenEarthMap. Our model outperforms state-of-the-art open-vocabulary segmentation methods, achieving an average improvement of 2.54 h-mIoU.
Related papers
- Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation [16.58381088280145]
We introduce Open-Vocabulary Remote Sensing Image Semantic (OVRSISS) to segment arbitrary semantic classes in remote sensing images.<n>To address the lack of OVRSISS datasets, we develop LandDiscover50K, a comprehensive dataset of 51,846 images covering 40 diverse semantic classes.<n>In addition, we propose a novel framework named GSNet that integrates domain priors from special remote sensing models and versatile capabilities of general vision-language models.
arXiv Detail & Related papers (2024-12-27T07:20:30Z) - Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model [32.03242732902217]
This paper addresses the challenge of adapting a model trained on source domain data to target domain samples.
We propose a joint-optimized adversarial network incorporating the "Segment Anything Model (SAM) (SAM-JOANet)"
arXiv Detail & Related papers (2024-11-08T02:15:20Z) - Open-Vocabulary Remote Sensing Image Semantic Segmentation [48.70312596065687]
Open-vocabulary image semantic segmentation (OVS) seeks to segment images into semantic regions across an open set of categories.
These approaches are predominantly tailored to natural images and struggle with the unique characteristics of remote sensing images.
We propose the first OVS framework specifically designed for remote sensing imagery.
arXiv Detail & Related papers (2024-09-12T01:16:25Z) - SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation [69.42764583465508]
We explore the potential of generative image diffusion to address the scarcity of annotated data in earth observation tasks.
To the best of our knowledge, we are the first to generate both images and corresponding masks for satellite segmentation.
arXiv Detail & Related papers (2024-03-25T10:30:22Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models [7.452422412106768]
We propose a novel method named Text2Seg for remote sensing semantic segmentation.
It overcomes the dependency on extensive annotations by employing an automatic prompt generation process.
We show that Text2Seg significantly improves zero-shot prediction performance compared to the vanilla SAM model.
arXiv Detail & Related papers (2023-04-20T18:39:41Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Dual Attention GANs for Semantic Image Synthesis [101.36015877815537]
We propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images.
We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM)
DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters.
arXiv Detail & Related papers (2020-08-29T17:49:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.