VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments
- URL: http://arxiv.org/abs/2603.04277v1
- Date: Wed, 04 Mar 2026 16:59:08 GMT
- Title: VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments
- Authors: Yifei Chen, Xupeng Chen, Feng Wang, Niangang Jiao, Jiayin Liu,
- Abstract summary: VANGUARD is a lightweight, deterministic Geometric Perception Skill designed as a callable tool for aerial robots.<n>On the DOTAv1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306images.<n> Integrated with SAM-based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100-entry benchmark.
- Score: 7.390183878674011
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous aerial robots operating in GPS-denied or communication-degraded environments frequently lose access to camera metadata and telemetry, leaving onboard perception systems unable to recover the absolute metric scale of the scene. As LLM/VLM-based planners are increasingly adopted as high-level agents for embodied systems, their ability to reason about physical dimensions becomes safety-critical -- yet our experiments show that five state-of-the-art VLMs suffer from spatial scale hallucinations, with median area estimation errors exceeding 50%. We propose VANGUARD, a lightweight, deterministic Geometric Perception Skill designed as a callable tool that any LLM-based agent can invoke to recover Ground Sample Distance (GSD) from ubiquitous environmental anchors: small vehicles detected via oriented bounding boxes, whose modal pixel length is robustly estimated through kernel density estimation and converted to GSD using a pre-calibrated reference length. The tool returns both a GSD estimate and a composite confidence score, enabling the calling agent to autonomously decide whether to trust the measurement or fall back to alternative strategies. On the DOTA~v1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306~images. Integrated with SAM-based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100-entry benchmark -- with 2.6x lower category dependence and 4x fewer catastrophic failures than the best VLM baseline -- demonstrating that equipping agents with deterministic geometric tools is essential for safe autonomous spatial reasoning.
Related papers
- Autonomous Navigation of Cloud-Controlled Quadcopters in Confined Spaces Using Multi-Modal Perception and LLM-Driven High Semantic Reasoning [0.0]
This paper introduces an advanced AI-driven perception system for autonomous navigation in GPS-denied indoor environments.<n>System integrates YOLOv11 for object detection, Depth Anything V2 for monocular depth estimation, a PCB equipped with Time-of-Flight (ToF) sensors and an Inertial Measurement Unit (IMU)<n> Experimental results in an indoor testbed demonstrate strong performance, with object detection achieving a mean Average Precision (mAP50) of 0.6, depth estimation Mean Absolute Error (MAE) of 7.2 cm, and end-to-end system latency below 1 second.
arXiv Detail & Related papers (2025-08-11T12:00:03Z) - NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z) - ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks [64.86209459039313]
ThinkGeo is an agentic benchmark designed to evaluate tool-augmented agents on remote sensing tasks via structured tool use and multi-step planning.<n>We implement a ReAct-style interaction loop and evaluate both open and closed-source LLMs on 486 structured agentic tasks with 1,773 expert-verified reasoning steps.<n>Our analysis reveals notable disparities in tool accuracy and planning consistency across models.
arXiv Detail & Related papers (2025-05-29T17:59:38Z) - Toward Onboard AI-Enabled Solutions to Space Object Detection for Space Sustainability [29.817805350971366]
This paper investigates the feasibility and effectiveness of employing vision sensors for space object detection.<n>It introduces models based on the Squeeze-and-Excitation (SE) layer, Vision Transformer (ViT) and the Generalized Efficient Layer Aggregation Network (GELAN)<n> Experimental results show that the proposed models achieve mean average precision at intersection over union threshold 0.5 (mAP50) scores of up to 0.751 and mean average precision averaged over intersection over union thresholds from 0.5 to 0.95 (mAP50:95) scores of up to 0.280.
arXiv Detail & Related papers (2025-05-03T01:56:52Z) - Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving [46.70405993442064]
We propose a novel framework termed Segmenting Objectiveness and Task-Awareness (SOTA) for autonomous driving scenes.<n>SOTA enhances the segmentation of objectiveness through a Semantic Fusion Block (SFB) and filters anomalies irrelevant to road navigation tasks.
arXiv Detail & Related papers (2025-04-27T10:08:54Z) - SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models [15.50826328938879]
We introduce SURDS, a benchmark designed to evaluate the spatial reasoning capabilities of vision language models (VLMs)<n>Built on the nuScenes dataset, SURDS comprises 41,080 vision-question-answer training instances and 9,250 evaluation samples.<n>We propose a reinforcement learning-based alignment scheme leveraging spatially grounded reward signals.
arXiv Detail & Related papers (2024-11-20T08:14:01Z) - Secure Navigation using Landmark-based Localization in a GPS-denied
Environment [1.19658449368018]
This paper proposes a novel framework that integrates landmark-based localization (LanBLoc) with an Extended Kalman Filter (EKF) to predict the future state of moving entities along the battlefield.
We present a simulated battlefield scenario for two different approaches that guide a moving entity through an obstacle and hazard-free path.
arXiv Detail & Related papers (2024-02-22T04:41:56Z) - AdvGPS: Adversarial GPS for Multi-Agent Perception Attack [47.59938285740803]
This study investigates whether specific GPS signals can easily mislead the multi-agent perception system.
We introduce textscAdvGPS, a method capable of generating adversarial GPS signals which are also stealthy for individual agents within the system.
Our experiments on the OPV2V dataset demonstrate that these attacks substantially undermine the performance of state-of-the-art methods.
arXiv Detail & Related papers (2024-01-30T23:13:41Z) - Ultra-low Power Deep Learning-based Monocular Relative Localization
Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones.
To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations.
Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z) - Learned Risk Metric Maps for Kinodynamic Systems [54.49871675894546]
We present Learned Risk Metric Maps for real-time estimation of coherent risk metrics of high dimensional dynamical systems.
LRMM models are simple to design and train, requiring only procedural generation of obstacle sets, state and control sampling, and supervised training of a function approximator.
arXiv Detail & Related papers (2023-02-28T17:51:43Z) - Inertial Hallucinations -- When Wearable Inertial Devices Start Seeing
Things [82.15959827765325]
We propose a novel approach to multimodal sensor fusion for Ambient Assisted Living (AAL)
We address two major shortcomings of standard multimodal approaches, limited area coverage and reduced reliability.
Our new framework fuses the concept of modality hallucination with triplet learning to train a model with different modalities to handle missing sensors at inference time.
arXiv Detail & Related papers (2022-07-14T10:04:18Z) - Distributed Variable-Baseline Stereo SLAM from two UAVs [17.513645771137178]
In this article, we employ two UAVs equipped with one monocular camera and one IMU each, to exploit their view overlap and relative distance measurements.
In order to control the glsuav agents autonomously, we propose a decentralized collaborative estimation scheme.
We demonstrate the effectiveness of the approach at high altitude flights of up to 160m, going significantly beyond the capabilities of state-of-the-art VIO methods.
arXiv Detail & Related papers (2020-09-10T12:16:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.