Related papers: OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents

OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents

URL: http://arxiv.org/abs/2602.17665v2
Date: Mon, 23 Feb 2026 18:59:54 GMT
Title: OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents
Authors: Akashah Shabbir, Muhammad Umer Sheikh, Muhammad Akhtar Munir, Hiyam Debary, Mustansar Fiaz, Muhammad Zaigham Zaheer, Paolo Fraccaro, Fahad Shahbaz Khan, Muhammad Haris Khan, Xiao Xiang Zhu, Salman Khan,
Abstract summary: We introduce a unified framework for developing tool-augmented geospatial agents trained on satellite imagery, natural-language queries, and detailed reasoning traces.<n>The training pipeline relies on supervised fine-tuning over structured reasoning trajectories, aligning the model with verified multistep tool interactions.<n>The accompanying corpus comprises 14,538 training and 1,169 evaluation instances, with more than 100K reasoning steps in the training split and over 7K reasoning steps in the evaluation split.
Score: 68.85365034738534
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in multimodal reasoning has enabled agents that can interpret imagery, connect it with language, and perform structured analytical tasks. Extending such capabilities to the remote sensing domain remains challenging, as models must reason over spatial scale, geographic structures, and multispectral indices while maintaining coherent multi-step logic. To bridge this gap, OpenEarthAgent introduces a unified framework for developing tool-augmented geospatial agents trained on satellite imagery, natural-language queries, and detailed reasoning traces. The training pipeline relies on supervised fine-tuning over structured reasoning trajectories, aligning the model with verified multistep tool interactions across diverse analytical contexts. The accompanying corpus comprises 14,538 training and 1,169 evaluation instances, with more than 100K reasoning steps in the training split and over 7K reasoning steps in the evaluation split. It spans urban, environmental, disaster, and infrastructure domains, and incorporates GIS-based operations alongside index analyses such as NDVI, NBR, and NDBI. Grounded in explicit reasoning traces, the learned agent demonstrates structured reasoning, stable spatial understanding, and interpretable behaviour through tool-driven geospatial interactions across diverse conditions. We report consistent improvements over a strong baseline and competitive performance relative to recent open and closed-source models.

Related papers

Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning [52.075928878249066]
Vision-guided models (VLMs) often struggle with geometric reasoning due to their limited perception of fundamental diagram elements.<n>We introduce GeoPerceive, a benchmark comprising diagram instances paired with domain-specific language representations.<n>We propose GeoDPO, a translator reinforcement learning framework.
arXiv Detail & Related papers (2026-02-26T07:28:04Z)
AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito [0.0]
This study develops an integrated AI framework to facilitate the transformation of legacy finite difference implementations into the Devito environment.<n>Retrieval-Augmented Generation (RAG) and open-source Large Language Models are combined through multi-stage iterative in the system's hybrid LangGraph architecture.
arXiv Detail & Related papers (2026-01-26T11:31:00Z)
Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems [9.388162021920206]
This survey presents the first comprehensive review of agentic AI in remote sensing.<n>We introduce a unified taxonomy distinguishing between single-agent copilots and multi-agent systems.<n>We review emerging benchmarks that move the evaluation from pixel-level accuracy to trajectory-aware reasoning correctness.
arXiv Detail & Related papers (2026-01-05T08:34:17Z)
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism [61.01709143437043]
We introduce a novel agent design framework centered on a Hierarchical Task Abstraction Mechanism (HTAM)<n>Specifically, HTAM moves beyond emulating social roles, instead structuring multi-agent systems into a logical hierarchy that mirrors the intrinsic task-dependency graph of a given domain.<n>We instantiate this framework as EarthAgent, a multi-agent system tailored for complex geospatial analysis.
arXiv Detail & Related papers (2025-11-21T12:25:47Z)
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization [53.080882980294795]
Current research on agentic visual reasoning enables deep multimodal understanding but primarily focuses on image manipulation tools.<n>In this work, we revisit the geolocalization task, which requires not only nuanced visual grounding but also web search to confirm or refine hypotheses.<n>Since existing geolocalization benchmarks fail to meet the need for high-resolution imagery and the localization challenge for deep agentic reasoning, we curate GeoBench.<n>We propose GeoVista, an agentic model that seamlessly integrates tool invocation within the reasoning loop, including an image-zoom-in tool to magnify regions of interest and a web-search tool to retrieve related
arXiv Detail & Related papers (2025-11-19T18:59:22Z)
ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving [5.7688835899861]
We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks.<n>ARCTraj addresses a gap by recording temporally ordered, object-level actions that capture how humans iteratively transform inputs into outputs.<n>It further defines a unified reasoning pipeline encompassing data collection, action abstraction, Markov decision process (MDP) formulation, and downstream learning.
arXiv Detail & Related papers (2025-11-14T08:52:53Z)
Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding [9.766922279347547]
Geological Everything Model 3D (GEM) is a unified generative architecture that reformulates tasks as prompt-conditioned inference.<n>GEM achieves zero-shot generalization across tasks with heterogeneous prompt types, without retraining for new tasks or data sources.<n>GEM demonstrates broad applicability across surveys and tasks, including Martian radar stratigraphy analysis, structural interpretation in subduction zones, full seismic stratigraphic interpretation, geobody segmentation, and property modeling.
arXiv Detail & Related papers (2025-07-01T04:14:13Z)
ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks [64.86209459039313]
ThinkGeo is an agentic benchmark designed to evaluate tool-augmented agents on remote sensing tasks via structured tool use and multi-step planning.<n>We implement a ReAct-style interaction loop and evaluate both open and closed-source LLMs on 486 structured agentic tasks with 1,773 expert-verified reasoning steps.<n>Our analysis reveals notable disparities in tool accuracy and planning consistency across models.
arXiv Detail & Related papers (2025-05-29T17:59:38Z)
A General Purpose Neural Architecture for Geospatial Systems [142.43454584836812]
We present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias. We envision how such a model may facilitate cooperation between members of the community.
arXiv Detail & Related papers (2022-11-04T09:58:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.