Related papers: WildOS: Open-Vocabulary Object Search in the Wild

WildOS: Open-Vocabulary Object Search in the Wild

URL: http://arxiv.org/abs/2602.19308v1
Date: Sun, 22 Feb 2026 19:14:00 GMT
Title: WildOS: Open-Vocabulary Object Search in the Wild
Authors: Hardik Shah, Erica Tevere, Deegan Atha, Marcel Kaufmann, Shehryar Khattak, Manthan Patel, Marco Hutter, Jonas Frey, Patrick Spieler,
Abstract summary: This work presents WildOS, a unified system for long-range, open-vocabulary object search.<n>We use a foundation-model-based vision module, ExploRFM, to score frontier nodes of the graph.<n>We also introduce a particle-filter-based method for coarse localization of the open-vocabulary target query.
Score: 12.098091049832965
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Autonomous navigation in complex, unstructured outdoor environments requires robots to operate over long ranges without prior maps and limited depth sensing. In such settings, relying solely on geometric frontiers for exploration is often insufficient. In such settings, the ability to reason semantically about where to go and what is safe to traverse is crucial for robust, efficient exploration. This work presents WildOS, a unified system for long-range, open-vocabulary object search that combines safe geometric exploration with semantic visual reasoning. WildOS builds a sparse navigation graph to maintain spatial memory, while utilizing a foundation-model-based vision module, ExploRFM, to score frontier nodes of the graph. ExploRFM simultaneously predicts traversability, visual frontiers, and object similarity in image space, enabling real-time, onboard semantic navigation tasks. The resulting vision-scored graph enables the robot to explore semantically meaningful directions while ensuring geometric safety. Furthermore, we introduce a particle-filter-based method for coarse localization of the open-vocabulary target query, that estimates candidate goal positions beyond the robot's immediate depth horizon, enabling effective planning toward distant goals. Extensive closed-loop field experiments across diverse off-road and urban terrains demonstrate that WildOS enables robust navigation, significantly outperforming purely geometric and purely vision-based baselines in both efficiency and autonomy. Our results highlight the potential of vision foundation models to drive open-world robotic behaviors that are both semantically informed and geometrically grounded. Project Page: https://leggedrobotics.github.io/wildos/

Related papers

RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation [20.730528223747967]
RAVEN is a 3D memory-based, behavior tree framework for aerial semantic navigation in unstructured outdoor environments.<n>It uses a spatially consistent semantic voxel-ray map as persistent memory, enabling long-horizon planning and avoiding purely reactive behaviors.<n>RAVEN outperforms baselines by 85.25% in simulation and demonstrate its real-world applicability through deployment on an aerial robot in outdoor field tests.
arXiv Detail & Related papers (2025-09-28T01:43:25Z)
TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals [10.69725316052444]
We present a novel RGB-only, object-level topometric navigation pipeline that enables zero-shot, long-horizon robot navigation.<n>Our approach integrates global topological path planning with local metric trajectory control, allowing the robot to navigate towards object-level sub-goals while avoiding obstacles.<n>We demonstrate the effectiveness of our method in both simulated environments and real-world tests, highlighting its robustness and deployability.
arXiv Detail & Related papers (2025-09-10T15:43:32Z)
Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera Fusion [10.438142938687326]
This paper presents a system for autonomous semantic exploration and dense semantic target mapping of a complex unknown environment using a ground robot equipped with a LiDAR-panoramic camera suite.<n>We first redefine the task as completing both geometric coverage and semantic viewpoint observation. We then manage semantic and geometric viewpoints separately and propose a novel Priority-driven Decoupled Local Sampler to generate local viewpoint sets.<n>In addition, we propose a Safe Aggressive Exploration State Machine, which allows aggressive exploration behavior while ensuring the robot's safety.
arXiv Detail & Related papers (2025-05-28T21:27:32Z)
ForesightNav: Learning Scene Imagination for Efficient Exploration [57.49417653636244]
We propose ForesightNav, a novel exploration strategy inspired by human imagination and reasoning.<n>Our approach equips robotic agents with the capability to predict contextual information, such as occupancy and semantic details, for unexplored regions.<n>We validate our imagination-based approach using the Structured3D dataset, demonstrating accurate occupancy prediction and superior performance in anticipating unseen scene geometry.
arXiv Detail & Related papers (2025-04-22T17:38:38Z)
FrontierNet: Learning Visual Cues to Explore [54.8265603996238]
This work aims at leveraging 2D visual cues for efficient autonomous exploration, addressing the limitations of extracting goal poses from a 3D map.<n>We propose a visual-only frontier-based exploration system, with FrontierNet as its core component.<n>Our approach provides an alternative to existing 3D-dependent goal-extraction approaches, achieving a 15% improvement in early-stage exploration efficiency.
arXiv Detail & Related papers (2025-01-08T16:25:32Z)
IPPON: Common Sense Guided Informative Path Planning for Object Goal Navigation [33.979481250363584]
This paper introduces a novel informative path planning and 3D object probability mapping approach. The mapping module computes the probability of the object of interest through semantic segmentation and a Bayes filter. Although our planner follows a zero-shot approach, it achieves state-of-the-art performance as measured by the Success weighted by Path Length (SPL) and Soft SPL in the Habitat ObjectNav Challenge 2023.
arXiv Detail & Related papers (2024-10-25T17:11:33Z)
OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model [12.096387853748938]
Air-ground robots (AGRs) are widely used in surveillance and disaster response.<n>Current AGR navigation systems perform well in static environments.<n>However, these systems face challenges in dynamic, severe occlusion scenes.<n>We propose OccMamba with an Efficient AGR-Planner to address these problems.
arXiv Detail & Related papers (2024-08-20T07:50:29Z)
ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability. We propose a learning-based approach that integrates learning and planning. ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z)
Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment. Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)
BADGR: An Autonomous Self-Supervised Learning-Based Navigation System [158.6392333480079]
BadGR is an end-to-end learning-based mobile robot navigation system. It can be trained with self-supervised off-policy data gathered in real-world environments. BadGR can navigate in real-world urban and off-road environments with geometrically distracting obstacles.
arXiv Detail & Related papers (2020-02-13T18:40:21Z)
Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent. Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry. We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.