Related papers: GeoSeg: Training-Free Reasoning-Driven Segmentation in Remote Sensing Imagery

GeoSeg: Training-Free Reasoning-Driven Segmentation in Remote Sensing Imagery

URL: http://arxiv.org/abs/2603.03983v1
Date: Wed, 04 Mar 2026 12:24:16 GMT
Title: GeoSeg: Training-Free Reasoning-Driven Segmentation in Remote Sensing Imagery
Authors: Lifan Jiang, Yuhang Pei, oxi Wu, Yan Zhao, Tianrun Wu, Shulong Yu, Lihui Zhang, Deng Cai,
Abstract summary: We present GeoSeg, a zero-shot, training-free framework that bypasses the supervision bottleneck for reasoning-driven remote sensing segmentation.<n>GeoSeg couples MLLM reasoning with precise localization via: (i) bias-aware coordinate refinement to correct systematic grounding shifts and (ii) a dual-route prompting mechanism to fuse semantic intent with fine-grained spatial cues.<n>Experiments show that GeoSeg consistently outperforms all baselines, with extensive ablations confirming the effectiveness and necessity of each component.
Score: 12.65874706732698
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in MLLMs are reframing segmentation from fixed-category prediction to instruction-grounded localization. While reasoning based segmentation has progressed rapidly in natural scenes, remote sensing lacks a generalizable solution due to the prohibitive cost of reasoning-oriented data and domain-specific challenges like overhead viewpoints. We present GeoSeg, a zero-shot, training-free framework that bypasses the supervision bottleneck for reasoning-driven remote sensing segmentation. GeoSeg couples MLLM reasoning with precise localization via: (i) bias-aware coordinate refinement to correct systematic grounding shifts and (ii) a dual-route prompting mechanism to fuse semantic intent with fine-grained spatial cues. We also introduce GeoSeg-Bench, a diagnostic benchmark of 810 image--query pairs with hierarchical difficulty levels. Experiments show that GeoSeg consistently outperforms all baselines, with extensive ablations confirming the effectiveness and necessity of each component.

Related papers

RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning [61.84363374647606]
Remote Sensing Visual Grounding (RSVG) aims to localize target objects in large-scale aerial imagery based on natural language descriptions.<n>These descriptions often rely heavily on positional cues, posing unique challenges for Multimodal Large Language Models (MLLMs) in spatial reasoning.<n>We propose a reasoning-guided, position-aware post-training framework, dubbed textbfRSGround-R1, to progressively enhance spatial understanding.
arXiv Detail & Related papers (2026-01-29T12:35:57Z)
SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images [49.52402091341301]
Current models can parse simple, single-target commands but fail when presented with complex geospatial scenarios.<n>We present LaSeRS, the first large-scale dataset built for comprehensive training and evaluation.<n>We also propose SegEarth-R2, an MLLM architecture designed for comprehensive language-guided segmentation in RS.
arXiv Detail & Related papers (2025-12-23T03:10:17Z)
Bridging Semantics and Geometry: A Decoupled LVLM-SAM Framework for Reasoning Segmentation in Remote Sensing [8.731693840957716]
Think2Seg-RS is a framework that trains an LVLM prompter to control a frozen Segment Anything Model (SAM) via structured geometric prompts.<n>The framework achieves state-of-the-art performance on the EarthReason dataset.<n> compact segmenters outperform larger ones under semantic-level supervision, and that negative prompts are ineffective in heterogeneous aerial backgrounds.
arXiv Detail & Related papers (2025-12-22T11:46:42Z)
GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes [84.52881742231152]
Multimodal large language models (MLLMs) have undergone rapid development in advancing geospatial scene understanding.<n>Recent studies have sought to enhance the reasoning capabilities of remote sensing MLLMs, typically through cold-start training with elaborately curated chain-of-thought (CoT) data.<n>We propose GeoZero, a framework that enables MLLMs to perform geospatial reasoning without any predefined CoT supervision.
arXiv Detail & Related papers (2025-11-27T17:28:09Z)
Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning [37.90271368636318]
Referring expression understanding in remote sensing poses unique challenges.<n>We propose Geo-R1, a reasoning-centric reinforcement fine-tuning (RFT) paradigm for few-shot geospatial referring.
arXiv Detail & Related papers (2025-09-26T07:01:12Z)
Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models [47.98900725310249]
New pipeline constructs a reasoning-oriented geo-localization dataset, MP16-Reason, using diverse social media images.<n>GLOBE incorporates task-specific rewards that jointly enhance localizability assessment, visual-cue reasoning, and geolocation accuracy.<n>Results demonstrate that GLOBE outperforms state-of-the-art open-source LVLMs on geo-localization tasks.
arXiv Detail & Related papers (2025-06-17T16:07:58Z)
SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model [61.97017867656831]
We introduce a new task, ie, geospatial pixel reasoning, which allows implicit querying and reasoning and generates the mask of the target region.<n>We construct and release the first large-scale benchmark dataset called EarthReason, which comprises 5,434 manually annotated image masks with over 30,000 implicit question-answer pairs.<n>SegEarth-R1 achieves state-of-the-art performance on both reasoning and referring segmentation tasks, significantly outperforming traditional and LLM-based segmentation methods.
arXiv Detail & Related papers (2025-04-13T16:36:47Z)
SegDesicNet: Lightweight Semantic Segmentation in Remote Sensing with Geo-Coordinate Embeddings for Domain Adaptation [0.5461938536945723]
We propose a novel unsupervised domain adaptation technique for remote sensing semantic segmentation.<n>Our proposed SegDesicNet module regresses the GRID positional encoding of the geo coordinates projected over the unit sphere to obtain the domain loss.<n>Our algorithm seeks to reduce the modeling disparity between artificial neural networks and human comprehension of the physical world.
arXiv Detail & Related papers (2025-03-11T11:01:18Z)
CLIP the Gap: A Single Domain Generalization Approach for Object Detection [60.20931827772482]
Single Domain Generalization tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. We propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss.
arXiv Detail & Related papers (2023-01-13T12:01:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.