OSDA: A Framework for Open-Set Discovery and Automatic Interpretation of Land-cover in Remote Sensing Imagery
- URL: http://arxiv.org/abs/2509.18693v2
- Date: Mon, 29 Sep 2025 02:22:45 GMT
- Title: OSDA: A Framework for Open-Set Discovery and Automatic Interpretation of Land-cover in Remote Sensing Imagery
- Authors: Siyi Chen, Kai Wang, Weicong Pang, Ruiming Yang, Ziru Chen, Renjun Gao, Alexis Kai Hon Lau, Dasa Gu, Chenchen Zhang, Cheng Li,
- Abstract summary: Open-set land-cover analysis in remote sensing requires the ability to achieve fine-grained spatial localization and semantically open categorization.<n>We introduce OSDA, an integrated three-stage framework for annotation-free open-set land-cover discovery, segmentation, and description.<n>Our work provides a scalable and interpretable solution for dynamic land-cover monitoring, showing strong potential for automated cartographic updating and large-scale earth observation analysis.
- Score: 10.196580289786414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-set land-cover analysis in remote sensing requires the ability to achieve fine-grained spatial localization and semantically open categorization. This involves not only detecting and segmenting novel objects without categorical supervision but also assigning them interpretable semantic labels through multimodal reasoning. In this study, we introduce OSDA, an integrated three-stage framework for annotation-free open-set land-cover discovery, segmentation, and description. The pipeline consists of: (1) precise discovery and mask extraction with a promptable fine-tuned segmentation model (SAM), (2) semantic attribution and contextual description via a two-phase fine-tuned multimodal large language model (MLLM), and (3) LLM-as-judge and manual scoring of the MLLMs evaluation. By combining pixel-level accuracy with high-level semantic understanding, OSDA addresses key challenges in open-world remote sensing interpretation. Designed to be architecture-agnostic and label-free, the framework supports robust evaluation across diverse satellite imagery without requiring manual annotation. Our work provides a scalable and interpretable solution for dynamic land-cover monitoring, showing strong potential for automated cartographic updating and large-scale earth observation analysis.
Related papers
- Open-Text Aerial Detection: A Unified Framework For Aerial Visual Grounding And Detection [19.500762008628254]
Open-Vocabulary Aerial Detection (OVAD) and Remote Sensing Visual Grounding (RSVG) have emerged as two key paradigms for aerial scene understanding.<n>We propose OTA-Det, the first unified framework that bridges both paradigms into a cohesive architecture.
arXiv Detail & Related papers (2026-02-08T05:29:01Z) - Semantically Aware UAV Landing Site Assessment from Remote Sensing Imagery via Multimodal Large Language Models [5.987458168544856]
Safe UAV emergency landing requires understanding complex semantic risks invisible to traditional geometric sensors.<n>We propose a novel framework leveraging Remote Sensing (RS) imagery and Multimodal Large Language Models (MLLMs) for context-aware landing site assessment.
arXiv Detail & Related papers (2026-02-01T11:30:03Z) - SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images [49.52402091341301]
Current models can parse simple, single-target commands but fail when presented with complex geospatial scenarios.<n>We present LaSeRS, the first large-scale dataset built for comprehensive training and evaluation.<n>We also propose SegEarth-R2, an MLLM architecture designed for comprehensive language-guided segmentation in RS.
arXiv Detail & Related papers (2025-12-23T03:10:17Z) - SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing Images [51.42466259821335]
We present a preliminary exploration of applying SAM 3 to the remote sensing OVSS task without any training.<n>First, we implement a mask fusion strategy that combines the outputs from SAM 3's semantic segmentation head and the Transformer decoder.<n>Second, we utilize the presence score from the presence head to filter out categories that do not exist in the scene.
arXiv Detail & Related papers (2025-12-09T15:42:28Z) - DescribeEarth: Describe Anything for Remote Sensing Images [56.04533626223295]
We propose Geo-DLC, a novel task of object-level fine-grained image captioning for remote sensing.<n>To support this task, we construct DE-Dataset, a large-scale dataset with detailed descriptions of object attributes, relationships, and contexts.<n>We also present DescribeEarth, a Multi-modal Large Language Model architecture explicitly designed for Geo-DLC.
arXiv Detail & Related papers (2025-09-30T01:53:34Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM [103.7537991413311]
Earth Observation (EO) data analysis is vital for monitoring environmental and human dynamics.<n>Recent Multimodal Large Language Models (MLLMs) show potential in EO understanding but remain restricted to single-sensor inputs.<n>We propose EarthMind, a unified vision-language framework that handles both single- and cross-sensor inputs.
arXiv Detail & Related papers (2025-06-02T13:36:05Z) - InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition [19.74617806521803]
InstructSAM is a training-free framework for instruction-driven object recognition.<n>We present EarthInstruct, the first InstructCDS benchmark for earth observation.
arXiv Detail & Related papers (2025-05-21T17:59:56Z) - SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model [61.97017867656831]
We introduce a new task, ie, geospatial pixel reasoning, which allows implicit querying and reasoning and generates the mask of the target region.<n>We construct and release the first large-scale benchmark dataset called EarthReason, which comprises 5,434 manually annotated image masks with over 30,000 implicit question-answer pairs.<n>SegEarth-R1 achieves state-of-the-art performance on both reasoning and referring segmentation tasks, significantly outperforming traditional and LLM-based segmentation methods.
arXiv Detail & Related papers (2025-04-13T16:36:47Z) - LidaRefer: Context-aware Outdoor 3D Visual Grounding for Autonomous Driving [1.0589208420411014]
3D visual grounding aims to locate objects or regions within 3D scenes guided by natural language descriptions.<n>Large-scale outdoor LiDAR scenes are dominated by background points and contain limited foreground information.<n>LidaRefer is a context-aware 3D VG framework for outdoor scenes.
arXiv Detail & Related papers (2024-11-07T01:12:01Z) - Weakly Supervised Open-Vocabulary Object Detection [31.605276665964787]
We propose a novel weakly supervised open-vocabulary object detection framework, namely WSOVOD, to extend traditional WSOD.
To achieve this, we explore three vital strategies, including dataset-level feature adaptation, image-level salient object localization, and region-level vision-language alignment.
arXiv Detail & Related papers (2023-12-19T18:59:53Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation
for autonomous vehicles [63.20765930558542]
3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization.
We propose a new dataset, Navya 3D (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain.
It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds.
arXiv Detail & Related papers (2023-02-16T13:41:19Z) - S3Net: 3D LiDAR Sparse Semantic Segmentation Network [1.330528227599978]
S3Net is a novel convolutional neural network for LiDAR point cloud semantic segmentation.
It adopts an encoder-decoder backbone that consists of Sparse Intra-channel Attention Module (SIntraAM) and Sparse Inter-channel Attention Module (SInterAM)
arXiv Detail & Related papers (2021-03-15T22:15:24Z) - Hierarchical Context Embedding for Region-based Object Detection [40.9463003508027]
Hierarchical Context Embedding (HCE) framework can be applied as a plug-and-play component.
To advance the recognition of context-dependent object categories, we propose an image-level categorical embedding module.
Novel RoI features are generated by exploiting hierarchically embedded context information beneath both whole images and interested regions.
arXiv Detail & Related papers (2020-08-04T05:33:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.