UniGeoSeg: Towards Unified Open-World Segmentation for Geospatial Scenes
- URL: http://arxiv.org/abs/2511.23332v1
- Date: Fri, 28 Nov 2025 16:40:08 GMT
- Title: UniGeoSeg: Towards Unified Open-World Segmentation for Geospatial Scenes
- Authors: Shuo Ni, Di Wang, He Chen, Haonan Guo, Ning Zhang, Jing Zhang,
- Abstract summary: We introduce GeoSeg-1M, the first million-scale dataset for remote sensing instruction-driven segmentation.<n>GeoSeg-1M contains 590K images, 117 categories, and 1.1M image-mask-instruction triplets.<n>We also present UniGeoSeg, a unified framework that incorporates task-aware text enhancement, latent knowledge memory, and a progressive training strategy.
- Score: 18.631940492768898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction-driven segmentation in remote sensing generates masks from guidance, offering great potential for accessible and generalizable applications. However, existing methods suffer from fragmented task formulations and limited instruction data, hindering effective understanding and generalization. To address these issues, we introduce GeoSeg-1M, the first million-scale dataset for remote sensing instruction-driven segmentation, constructed via an automatic mask filtering and instruction generation pipeline that synthesizes referring, interactive, and reasoning segmentation instructions from multiple public datasets. GeoSeg-1M contains 590K images, 117 categories, and 1.1M image-mask-instruction triplets. Building upon this foundation, we further curate GeoSeg-Bench, a challenging benchmark designed to evaluate contextual understanding and reasoning capabilities across diverse instruction-driven tasks and complex geospatial scenes. Furthermore, we present UniGeoSeg, a unified framework that serves as a strong baseline, incorporating task-aware text enhancement, latent knowledge memory, and a progressive training strategy to facilitate multi-task learning. Extensive experiments demonstrate the state-of-the-art performance of UniGeoSeg across GeoSeg-Bench and diverse public benchmarks, while exhibiting strong zero-shot generalization. Datasets and source code were released at https://github.com/MiliLab/UniGeoSeg.
Related papers
- SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images [49.52402091341301]
Current models can parse simple, single-target commands but fail when presented with complex geospatial scenarios.<n>We present LaSeRS, the first large-scale dataset built for comprehensive training and evaluation.<n>We also propose SegEarth-R2, an MLLM architecture designed for comprehensive language-guided segmentation in RS.
arXiv Detail & Related papers (2025-12-23T03:10:17Z) - GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes [84.52881742231152]
Multimodal large language models (MLLMs) have undergone rapid development in advancing geospatial scene understanding.<n>Recent studies have sought to enhance the reasoning capabilities of remote sensing MLLMs, typically through cold-start training with elaborately curated chain-of-thought (CoT) data.<n>We propose GeoZero, a framework that enables MLLMs to perform geospatial reasoning without any predefined CoT supervision.
arXiv Detail & Related papers (2025-11-27T17:28:09Z) - Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning [37.90271368636318]
Referring expression understanding in remote sensing poses unique challenges.<n>We propose Geo-R1, a reasoning-centric reinforcement fine-tuning (RFT) paradigm for few-shot geospatial referring.
arXiv Detail & Related papers (2025-09-26T07:01:12Z) - SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model [61.97017867656831]
We introduce a new task, ie, geospatial pixel reasoning, which allows implicit querying and reasoning and generates the mask of the target region.<n>We construct and release the first large-scale benchmark dataset called EarthReason, which comprises 5,434 manually annotated image masks with over 30,000 implicit question-answer pairs.<n>SegEarth-R1 achieves state-of-the-art performance on both reasoning and referring segmentation tasks, significantly outperforming traditional and LLM-based segmentation methods.
arXiv Detail & Related papers (2025-04-13T16:36:47Z) - OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence.<n>We propose a MLLM (OmniGeo) tailored to geospatial applications.<n>By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z) - Geo-Semantic-Parsing: AI-powered geoparsing by traversing semantic knowledge graphs [0.7422344184734279]
We introduce a novel geoparsing and geotagging technique called Geo-Semantic-Parsing (GSP)<n>GSP identifies location references in free text and extracts the corresponding geographic coordinates.<n>We evaluate GSP on a well-known reference dataset including almost 10k event-related tweets.
arXiv Detail & Related papers (2025-03-03T10:30:23Z) - Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.<n>GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.<n>We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z) - GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales.<n>We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z) - GeoGPT: Understanding and Processing Geospatial Tasks through An
Autonomous GPT [6.618846295332767]
Decision-makers in GIS need to combine a series of spatial algorithms and operations to solve geospatial tasks.
We develop a new framework called GeoGPT that can conduct geospatial data collection, processing, and analysis in an autonomous manner.
arXiv Detail & Related papers (2023-07-16T03:03:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.