Related papers: Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation

Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation

URL: http://arxiv.org/abs/2504.08806v1
Date: Wed, 09 Apr 2025 02:19:22 GMT
Title: Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation
Authors: Luo Ling, Bai Qianqian,
Abstract summary: We propose BrainNav, a bio-inspired spatial cognitive navigation framework inspired by biological spatial cognition theories and cognitive map theory.<n>BrainNav integrates dual-map (coordinate map and topological map) and dual-orientation (relative orientation and absolute orientation) strategies, enabling real-time navigation through dynamic scene capture and path planning.<n>Its five core modules-Hippocampal Memory Hub, Visual Cortex Perception Engine, Parietal Spatial Constructor, Prefrontal Decision Center, and Cerebellar Motion Execution Unit-mimic biological cognitive functions to reduce spatial hallucinations and enhance adaptability.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enhancing the spatial perception capabilities of mobile robots is crucial for achieving embodied Vision-and-Language Navigation (VLN). Although significant progress has been made in simulated environments, directly transferring these capabilities to real-world scenarios often results in severe hallucination phenomena, causing robots to lose effective spatial awareness. To address this issue, we propose BrainNav, a bio-inspired spatial cognitive navigation framework inspired by biological spatial cognition theories and cognitive map theory. BrainNav integrates dual-map (coordinate map and topological map) and dual-orientation (relative orientation and absolute orientation) strategies, enabling real-time navigation through dynamic scene capture and path planning. Its five core modules-Hippocampal Memory Hub, Visual Cortex Perception Engine, Parietal Spatial Constructor, Prefrontal Decision Center, and Cerebellar Motion Execution Unit-mimic biological cognitive functions to reduce spatial hallucinations and enhance adaptability. Validated in a zero-shot real-world lab environment using the Limo Pro robot, BrainNav, compatible with GPT-4, outperforms existing State-of-the-Art (SOTA) Vision-and-Language Navigation in Continuous Environments (VLN-CE) methods without fine-tuning.

Related papers

EDEN: Entorhinal Driven Egocentric Navigation Toward Robotic Deployment [1.5190286092106713]
EDEN is a biologically inspired navigation framework that integrates learned entorhinal-like grid cell representations and reinforcement learning to enable autonomous navigation.<n>Inspired by the mammalian entorhinal-hippocampal system, EDEN allows agents to perform path integration and vector-based navigation using visual and motion sensor data.
arXiv Detail & Related papers (2025-06-03T16:28:33Z)
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments [19.818370526976974]
Vision Language Navigation in Continuous Environments (VLN-CE) represents a frontier in embodied AI. We introduce Cog-GA, a generative agent founded on large language models (LLMs) tailored for VLN-CE tasks. Cog-GA employs a dual-pronged strategy to emulate human-like cognitive processes.
arXiv Detail & Related papers (2024-09-04T08:30:03Z)
IN-Sight: Interactive Navigation through Sight [20.184155117341497]
IN-Sight is a novel approach to self-supervised path planning. It calculates traversability scores and incorporates them into a semantic map. To precisely navigate around obstacles, IN-Sight employs a local planner.
arXiv Detail & Related papers (2024-08-01T07:27:54Z)
Vision and Language Navigation in the Real World via Online Visual Language Mapping [18.769171505280127]
Vision-and-language navigation (VLN) methods are mainly evaluated in simulation. We propose a novel framework to address the VLN task in the real world. We evaluate the proposed pipeline on an Interbotix LoCoBot WX250 in an unseen lab environment.
arXiv Detail & Related papers (2023-10-16T20:44:09Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation. It incorporates environmental feedback for refining future plans and adjusting its actions. It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z)
Bio-inspired spike-based Hippocampus and Posterior Parietal Cortex models for robot navigation and environment pseudo-mapping [52.77024349608834]
This work proposes a spike-based robotic navigation and environment pseudomapping system. The hippocampus is in charge of maintaining a representation of an environment state map, and the PPC is in charge of local decision-making. This is the first implementation of an environment pseudo-mapping system with dynamic learning based on a bio-inspired hippocampal memory.
arXiv Detail & Related papers (2023-05-22T10:20:34Z)
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments [56.194988818341976]
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. We propose ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets.
arXiv Detail & Related papers (2023-04-06T13:07:17Z)
Emergence of Maps in the Memories of Blind Navigation Agents [68.41901534985575]
Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or'mental') maps. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms.
arXiv Detail & Related papers (2023-01-30T20:09:39Z)
Learning Synthetic to Real Transfer for Localization and Navigational Tasks [7.019683407682642]
Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control. This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done with as few efforts as possible. To design the navigation pipeline four main challenges arise; environment, localization, navigation and planning.
arXiv Detail & Related papers (2020-11-20T08:37:03Z)
Visual Navigation Among Humans with Optimal Control as a Supervisor [72.5188978268463]
We propose an approach that combines learning-based perception with model-based optimal control to navigate among humans. Our approach is enabled by our novel data-generation tool, HumANav. We demonstrate that the learned navigation policies can anticipate and react to humans without explicitly predicting future human motion.
arXiv Detail & Related papers (2020-03-20T16:13:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.