Related papers: FeudalNav: A Simple Framework for Visual Navigation

FeudalNav: A Simple Framework for Visual Navigation

URL: http://arxiv.org/abs/2602.06974v1
Date: Thu, 15 Jan 2026 22:10:29 GMT
Title: FeudalNav: A Simple Framework for Visual Navigation
Authors: Faith Johnson, Bryan Bo Cao, Shubham Jain, Ashwin Ashok, Kristin Dana,
Abstract summary: We develop a hierarchical framework that decomposes the navigation decision-making process into multiple levels.<n>Our method learns to select subgoals through a simple, transferable waypoint selection network.<n>We show competitive results with a suite of SOTA methods in Habitat AI environments without using any odometry in training or inference.
Score: 7.136542835931238
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Visual navigation for robotics is inspired by the human ability to navigate environments using visual cues and memory, eliminating the need for detailed maps. In unseen, unmapped, or GPS-denied settings, traditional metric map-based methods fall short, prompting a shift toward learning-based approaches with minimal exploration. In this work, we develop a hierarchical framework that decomposes the navigation decision-making process into multiple levels. Our method learns to select subgoals through a simple, transferable waypoint selection network. A key component of the approach is a latent-space memory module organized solely by visual similarity, as a proxy for distance. This alternative to graph-based topological representations proves sufficient for navigation tasks, providing a compact, light-weight, simple-to-train navigator that can find its way to the goal in novel locations. We show competitive results with a suite of SOTA methods in Habitat AI environments without using any odometry in training or inference. An additional contribution leverages the interpretablility of the framework for interactive navigation. We consider the question: how much direction intervention/interaction is needed to achieve success in all trials? We demonstrate that even minimal human involvement can significantly enhance overall navigation performance.

Related papers

ReasonNavi: Human-Inspired Global Map Reasoning for Zero-Shot Embodied Navigation [53.95797153529148]
Embodied agents often struggle with efficient navigation because they rely primarily on partial egocentric observations.<n>We introduce ReasonNavi, a human-inspired framework that operationalizes this reason-then-act paradigm by coupling Multimodal Large Language Models (MLLMs) with deterministic planners.
arXiv Detail & Related papers (2026-01-26T19:09:20Z)
Memory Proxy Maps for Visual Navigation [6.1190419149081245]
Visual navigation takes inspiration from humans, who navigate in previously unseen environments using vision without detailed environment maps.<n>Inspired by this, we introduce a novel no-RL, no-graph, no-odometry approach to visual navigation using feudal learning to build a three tiered agent.
arXiv Detail & Related papers (2024-11-15T02:37:14Z)
Hierarchical end-to-end autonomous navigation through few-shot waypoint detection [0.0]
Human navigation is facilitated through the association of actions with landmarks. Current autonomous navigation schemes rely on accurate positioning devices and algorithms as well as extensive streams of sensory data collected from the environment. We propose a hierarchical end-to-end meta-learning scheme that enables a mobile robot to navigate in a previously unknown environment.
arXiv Detail & Related papers (2024-09-23T00:03:39Z)
Feudal Networks for Visual Navigation [6.1190419149081245]
We introduce a new approach to visual navigation using feudal learning.<n>Agents at each level see a different aspect of the task and operate at different spatial and temporal scales.<n>The resulting feudal navigation network achieves near SOTA performance.
arXiv Detail & Related papers (2024-02-19T20:05:41Z)
Interactive Semantic Map Representation for Skill-based Visual Object Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. We have implemented this representation into a full-fledged navigation approach called SkillTron. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z)
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z)
Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation. Our method significantly outperforms the state of the art on the challenging MP3D dataset. We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z)
ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability. We propose a learning-based approach that integrates learning and planning. ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z)
Lifelong Topological Visual Navigation [16.41858724205884]
We propose a learning-based visual navigation method with graph update strategies that improve lifelong navigation performance over time. We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods. Unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed.
arXiv Detail & Related papers (2021-10-16T06:16:14Z)
ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals. We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning. We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.