Related papers: CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

URL: http://arxiv.org/abs/2503.03921v2
Date: Thu, 26 Jun 2025 06:42:04 GMT
Title: CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
Authors: Arthur Zhang, Harshit Sikchi, Amy Zhang, Joydeep Biswas,
Abstract summary: CREStE is a scalable learning-based mapless navigation framework.<n>It addresses the open-world generalization and robustness challenges of outdoor urban navigation.<n>We evaluate CREStE on the task of kilometer-scale mapless navigation in a variety of city, offroad, and residential environments.
Score: 13.922655150502365
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce CREStE, a scalable learning-based mapless navigation framework to address the open-world generalization and robustness challenges of outdoor urban navigation. Key to achieving this is learning perceptual representations that generalize to open-set factors (e.g. novel semantic classes, terrains, dynamic entities) and inferring expert-aligned navigation costs from limited demonstrations. CREStE addresses both these issues, introducing 1) a visual foundation model (VFM) distillation objective for learning open-set structured bird's-eye-view perceptual representations, and 2) counterfactual inverse reinforcement learning (IRL), a novel active learning formulation that uses counterfactual trajectory demonstrations to reason about the most important cues when inferring navigation costs. We evaluate CREStE on the task of kilometer-scale mapless navigation in a variety of city, offroad, and residential environments and find that it outperforms all state-of-the-art approaches with 70% fewer human interventions, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. Videos and additional materials can be found on the project page: https://amrl.cs.utexas.edu/creste

Related papers

History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation [64.51891404034164]
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments.<n>Existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects.<n>This work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline.
arXiv Detail & Related papers (2025-12-16T09:16:07Z)
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z)
PIG-Nav: Key Insights for Pretrained Image Goal Navigation Models [16.820485795257195]
PIG-Nav (Pretrained Image-Goal Navigation) is a new approach that further investigates pretraining strategies for vision-based navigation models.<n>We identify two critical design choices that consistently improve the performance of pretrained navigation models.<n>Our model achieves an average improvement of 22.6% in zero-shot settings and a 37.5% improvement in fine-tuning settings over existing visual navigation foundation models.
arXiv Detail & Related papers (2025-07-23T05:34:20Z)
NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants [24.689242976554482]
Navigating unfamiliar environments presents significant challenges for household robots.<n>Existing reinforcement learning methods cannot be directly transferred to new environments.<n>We try to transfer the logical knowledge and the generalization ability of pre-trained foundation models to zero-shot navigation.
arXiv Detail & Related papers (2025-02-19T17:27:47Z)
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos [11.912608309403359]
We propose a scalable, data-driven approach for human-like urban navigation.<n>We train agents on thousands of hours of in-the-wild city walking and driving videos sourced from the web.<n>Our model learns sophisticated navigation policies to handle diverse challenges and critical scenarios.
arXiv Detail & Related papers (2024-11-26T19:02:20Z)
Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation [11.667940255053582]
This paper uses the RGB and depth information of the training scene to pretrain the feature extractor, which improves navigation efficiency. We evaluated our method on AI2-Thor and RoboTHOR and demonstrated that it significantly outperforms state-of-the-art (SOTA) methods on success rate and navigation efficiency.
arXiv Detail & Related papers (2024-06-20T08:35:10Z)
E(2)-Equivariant Graph Planning for Navigation [26.016209191573605]
We exploit Euclidean symmetry in planning for 2D navigation. To address the challenges of unstructured environments, we formulate the navigation problem as planning on a geometric graph.
arXiv Detail & Related papers (2023-09-22T17:59:48Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z)
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation [75.13546386761153]
We present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC) ESC transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience. Experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines.
arXiv Detail & Related papers (2023-01-30T18:37:32Z)
Offline Reinforcement Learning for Visual Navigation [66.88830049694457]
ReViND is the first offline RL system for robotic navigation that can leverage previously collected data to optimize user-specified reward functions in the real-world. We show that ReViND can navigate to distant goals using only offline training from this dataset, and exhibit behaviors that qualitatively differ based on the user-specified reward function.
arXiv Detail & Related papers (2022-12-16T02:23:50Z)
Augmented reality navigation system for visual prosthesis [67.09251544230744]
We propose an augmented reality navigation system for visual prosthesis that incorporates a software of reactive navigation and path planning. It consists on four steps: locating the subject on a map, planning the subject trajectory, showing it to the subject and re-planning without obstacles. Results show how our augmented navigation system help navigation performance by reducing the time and distance to reach the goals, even significantly reducing the number of obstacles collisions.
arXiv Detail & Related papers (2021-09-30T09:41:40Z)
Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z)
Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images. We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z)
Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment. Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.