HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment
- URL: http://arxiv.org/abs/2510.22917v2
- Date: Tue, 28 Oct 2025 02:49:09 GMT
- Title: HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment
- Authors: Zecheng Yin, Hao Zhao, Zhen Li,
- Abstract summary: We propose Hybrid Perception Navigation (HyNav) to enhance the effectiveness and intelligence of navigation in unknown environments.<n>Our method captures cues richer and finds the objects more effectively, by simultaneously leveraging information understanding from egocentric observations and the top-down map.
- Score: 9.835605248219586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective-oriented navigation(ObjNav) enables robot to navigate to target object directly and autonomously in an unknown environment. Effective perception in navigation in unknown environment is critical for autonomous robots. While egocentric observations from RGB-D sensors provide abundant local information, real-time top-down maps offer valuable global context for ObjNav. Nevertheless, the majority of existing studies focus on a single source, seldom integrating these two complementary perceptual modalities, despite the fact that humans naturally attend to both. With the rapid advancement of Vision-Language Models(VLMs), we propose Hybrid Perception Navigation (HyPerNav), leveraging VLMs' strong reasoning and vision-language understanding capabilities to jointly perceive both local and global information to enhance the effectiveness and intelligence of navigation in unknown environments. In both massive simulation evaluation and real-world validation, our methods achieved state-of-the-art performance against popular baselines. Benefiting from hybrid perception approach, our method captures richer cues and finds the objects more effectively, by simultaneously leveraging information understanding from egocentric observations and the top-down map. Our ablation study further proved that either of the hybrid perception contributes to the navigation performance.
Related papers
- DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation [55.888688171010365]
DORAEMON is a cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities.<n>We evaluate DORAEMON on the HM3D, MP3D and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics.
arXiv Detail & Related papers (2025-05-28T04:46:13Z) - Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments [41.75629159747654]
We introduce Vision-Language Attention Distillation (Vi-LAD), a novel approach for distilling socially compliant navigation knowledge.<n>Vi-LAD fine-tunes a transformer-based model using intermediate attention maps extracted from a pre-trained vision-action model.<n>We validate our approach through real-world experiments on a Husky wheeled robot, demonstrating significant improvements over state-of-the-art navigation methods.
arXiv Detail & Related papers (2025-03-12T20:38:23Z) - Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation [35.71602601385161]
We present a novel vision-language model (VLM)-based navigation framework.<n>Our approach enhances spatial reasoning and decision-making in long-horizon tasks.<n> Experimental results demonstrate that the proposed method surpasses previous state-of-the-art approaches in object navigation tasks.
arXiv Detail & Related papers (2025-02-20T04:41:40Z) - NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants [24.689242976554482]
Navigating unfamiliar environments presents significant challenges for household robots.<n>Existing reinforcement learning methods cannot be directly transferred to new environments.<n>We try to transfer the logical knowledge and the generalization ability of pre-trained foundation models to zero-shot navigation.
arXiv Detail & Related papers (2025-02-19T17:27:47Z) - CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs [39.657663028904814]
Object goal navigation (ObjectNav) is a fundamental task in embodied AI, requiring an agent to locate a target object in previously unseen environments.<n>We propose CogNav, a framework designed to mimic the cognitive process using large language models.<n>CogNav significantly improves the success rate of ObjectNav at least by relative 14% over the state-of-the-arts.
arXiv Detail & Related papers (2024-12-11T09:50:35Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation [61.08389704326803]
Vision-and-language navigation (VLN) is the task to enable an embodied agent to navigate to a remote location following the natural language instruction in real scenes.
Most of the previous approaches utilize the entire features or object-centric features to represent navigable candidates.
We propose a Knowledge Enhanced Reasoning Model (KERM) to leverage knowledge to improve agent navigation ability.
arXiv Detail & Related papers (2023-03-28T08:00:46Z) - ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object
Navigation [75.13546386761153]
We present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC)
ESC transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience.
Experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines.
arXiv Detail & Related papers (2023-01-30T18:37:32Z) - Towards self-attention based visual navigation in the real world [0.0]
Vision guided navigation requires processing complex visual information to inform task-orientated decisions.
Deep Reinforcement Learning agents trained in simulation often exhibit unsatisfactory results when deployed in the real-world.
This is the first demonstration of a self-attention based agent successfully trained in navigating a 3D action space, using less than 4000 parameters.
arXiv Detail & Related papers (2022-09-15T04:51:42Z) - UAS Navigation in the Real World Using Visual Observation [0.4297070083645048]
This paper presents a novel end-to-end Unmanned Aerial System (UAS) navigation approach for long-range visual navigation in the real world.
Our system combines the reinforcement learning (RL) and image matching approaches.
We demonstrate that the UAS can learn navigating to the destination hundreds meters away from the starting point with the shortest path in the real world scenario.
arXiv Detail & Related papers (2022-08-25T14:40:53Z) - Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions.
By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.