Related papers: RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation

RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation

URL: http://arxiv.org/abs/2509.23563v1
Date: Sun, 28 Sep 2025 01:43:25 GMT
Title: RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation
Authors: Seungchan Kim, Omar Alama, Dmytro Kurdydyk, John Keller, Nikhil Keetha, Wenshan Wang, Yonatan Bisk, Sebastian Scherer,
Abstract summary: RAVEN is a 3D memory-based, behavior tree framework for aerial semantic navigation in unstructured outdoor environments.<n>It uses a spatially consistent semantic voxel-ray map as persistent memory, enabling long-horizon planning and avoiding purely reactive behaviors.<n>RAVEN outperforms baselines by 85.25% in simulation and demonstrate its real-world applicability through deployment on an aerial robot in outdoor field tests.
Score: 20.730528223747967
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aerial outdoor semantic navigation requires robots to explore large, unstructured environments to locate target objects. Recent advances in semantic navigation have demonstrated open-set object-goal navigation in indoor settings, but these methods remain limited by constrained spatial ranges and structured layouts, making them unsuitable for long-range outdoor search. While outdoor semantic navigation approaches exist, they either rely on reactive policies based on current observations, which tend to produce short-sighted behaviors, or precompute scene graphs offline for navigation, limiting adaptability to online deployment. We present RAVEN, a 3D memory-based, behavior tree framework for aerial semantic navigation in unstructured outdoor environments. It (1) uses a spatially consistent semantic voxel-ray map as persistent memory, enabling long-horizon planning and avoiding purely reactive behaviors, (2) combines short-range voxel search and long-range ray search to scale to large environments, (3) leverages a large vision-language model to suggest auxiliary cues, mitigating sparsity of outdoor targets. These components are coordinated by a behavior tree, which adaptively switches behaviors for robust operation. We evaluate RAVEN in 10 photorealistic outdoor simulation environments over 100 semantic tasks, encompassing single-object search, multi-class, multi-instance navigation and sequential task changes. Results show RAVEN outperforms baselines by 85.25% in simulation and demonstrate its real-world applicability through deployment on an aerial robot in outdoor field tests.

Related papers

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers [54.661157616245966]
Open-world navigation requires robots to make decisions in complex everyday environments.<n>Recent advances in vision--language navigation (VLN) and vision--language--action (VLA) models enable end-to-end policies conditioned on natural language.<n>We propose OpenFrontier, a training-free navigation framework that seamlessly integrates diverse vision--language prior models.
arXiv Detail & Related papers (2026-03-05T17:02:22Z)
WildOS: Open-Vocabulary Object Search in the Wild [12.098091049832965]
This work presents WildOS, a unified system for long-range, open-vocabulary object search.<n>We use a foundation-model-based vision module, ExploRFM, to score frontier nodes of the graph.<n>We also introduce a particle-filter-based method for coarse localization of the open-vocabulary target query.
arXiv Detail & Related papers (2026-02-22T19:14:00Z)
Nav-$R^2$ Dual-Relation Reasoning for Generalizable Open-Vocabulary Object-Goal Navigation [67.68165784193556]
Nav-$R2$ is a framework that explicitly models two types of relationships, target-environment modeling and environment-action planning.<n>Our SA-Mem preserves the most target-relevant and current observation-relevant features from both temporal and semantic perspectives.<n>Nav-R2 achieves state-of-the-art performance in localizing unseen objects through a streamlined and efficient pipeline.
arXiv Detail & Related papers (2025-12-02T04:21:02Z)
FOM-Nav: Frontier-Object Maps for Object Goal Navigation [65.76906445210112]
FOM-Nav is a framework that enhances exploration efficiency through Frontier-Object Maps and vision-language models.<n>To train FOM-Nav, we automatically construct large-scale navigation datasets from real-world scanned environments.<n> FOM-Nav achieves state-of-the-art performance on the MP3D and HM3D benchmarks, particularly in navigation efficiency metric SPL.
arXiv Detail & Related papers (2025-11-30T18:16:09Z)
TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals [10.69725316052444]
We present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation.<n>Our approach integrates global topological path planning with local metric trajectory control, allowing the robot to navigate towards object-level sub-goals while avoiding obstacles.<n>We demonstrate the effectiveness of our method in both simulated environments and real-world tests, highlighting its robustness and deployability.
arXiv Detail & Related papers (2025-09-10T15:43:32Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
Adaptive Interactive Navigation of Quadruped Robots using Large Language Models [14.14967096139099]
We present a primitive tree for task planning with large language models (LLMs)<n>We adopt reinforcement learning to pre-train a comprehensive skill library containing versatile locomotion and interaction behaviors for motion planning.<n> integrated with the tree structure, the replanning mechanism allows for convenient node addition and pruning.
arXiv Detail & Related papers (2025-03-29T02:17:52Z)
OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model [12.096387853748938]
Air-ground robots (AGRs) are widely used in surveillance and disaster response.<n>Current AGR navigation systems perform well in static environments.<n>However, these systems face challenges in dynamic, severe occlusion scenes.<n>We propose OccMamba with an Efficient AGR-Planner to address these problems.
arXiv Detail & Related papers (2024-08-20T07:50:29Z)
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation [75.13546386761153]
We present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC) ESC transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience. Experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines.
arXiv Detail & Related papers (2023-01-30T18:37:32Z)
A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations [20.15854546504947]
We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
arXiv Detail & Related papers (2022-11-29T15:48:54Z)
Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN) It is compartmentalized enough to accurately memorize the percepts during navigation. It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z)
Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment. Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.