From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
- URL: http://arxiv.org/abs/2508.17198v1
- Date: Sun, 24 Aug 2025 03:20:48 GMT
- Title: From reactive to cognitive: brain-inspired spatial intelligence for embodied agents
- Authors: Shouwei Ruan, Liyuan Wang, Caixin Kang, Qihui Zhu, Songming Liu, Xingxing Wei, Hang Su,
- Abstract summary: Brain-inspired Spatial Cognition for Navigation (BSC-Nav) is a unified framework for constructing and leveraging structured spatial memory in embodied agents.<n> BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals.
- Score: 50.99942960312313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial cognition enables adaptive goal-directed behavior by constructing internal models of space. Robust biological systems consolidate spatial knowledge into three interconnected forms: \textit{landmarks} for salient cues, \textit{route knowledge} for movement trajectories, and \textit{survey knowledge} for map-like representations. While recent advances in multi-modal large language models (MLLMs) have enabled visual-language reasoning in embodied agents, these efforts lack structured spatial memory and instead operate reactively, limiting their generalization and adaptability in complex real-world environments. Here we present Brain-inspired Spatial Cognition for Navigation (BSC-Nav), a unified framework for constructing and leveraging structured spatial memory in embodied agents. BSC-Nav builds allocentric cognitive maps from egocentric trajectories and contextual cues, and dynamically retrieves spatial knowledge aligned with semantic goals. Integrated with powerful MLLMs, BSC-Nav achieves state-of-the-art efficacy and efficiency across diverse navigation tasks, demonstrates strong zero-shot generalization, and supports versatile embodied behaviors in the real physical world, offering a scalable and biologically grounded path toward general-purpose spatial intelligence.
Related papers
- ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments [134.95780765985515]
We introduce ACE-Brain-0, a generalist foundation brain that unifies spatial reasoning, autonomous driving, and embodied manipulation.<n>Our key insight is that spatial intelligence serves as a universal scaffold across diverse physical embodiments.<n>We propose the Scaffold-Specialize-Reconcile(SSR) paradigm, which first establishes a shared spatial foundation, then cultivates domain-specialized experts, and finally harmonizes them through data-free model merging.
arXiv Detail & Related papers (2026-03-03T17:53:45Z) - Thinking with Geometry: Active Geometry Integration for Spatial Reasoning [68.59084007360615]
We propose GeoThinker, a framework that shifts paradigm passive fusion to active perception.<n>Instead of feature mixing, GeoThinker enables the model to selectively retrieve geometric evidence conditioned on its internal reasoning demands.<n>Our results indicate that the ability to actively integrate spatial structures is essential for next-generation spatial intelligence.
arXiv Detail & Related papers (2026-02-05T18:59:32Z) - Mind Meets Space: Rethinking Agentic Spatial Intelligence from a Neuroscience-inspired Perspective [53.556348738917166]
Recent advances in agentic AI have led to systems capable of autonomous task execution and language-based reasoning.<n>Human spatial intelligence, rooted in integrated multisensory perception, spatial memory, and cognitive maps, enables flexible, context-aware decision-making in unstructured environments.
arXiv Detail & Related papers (2025-09-11T05:23:22Z) - Can LLMs Learn to Map the World from Local Descriptions? [50.490593949836146]
This study investigates whether Large Language Models (LLMs) can construct coherent global spatial cognition.<n> Experiments conducted in a simulated urban environment demonstrate that LLMs exhibit latent representations aligned with real-world spatial distributions.
arXiv Detail & Related papers (2025-05-27T08:22:58Z) - Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation [0.0]
We propose BrainNav, a bio-inspired spatial cognitive navigation framework inspired by biological spatial cognition theories and cognitive map theory.<n>BrainNav integrates dual-map (coordinate map and topological map) and dual-orientation (relative orientation and absolute orientation) strategies, enabling real-time navigation through dynamic scene capture and path planning.<n>Its five core modules-Hippocampal Memory Hub, Visual Cortex Perception Engine, Parietal Spatial Constructor, Prefrontal Decision Center, and Cerebellar Motion Execution Unit-mimic biological cognitive functions to reduce spatial hallucinations and enhance adaptability.
arXiv Detail & Related papers (2025-04-09T02:19:22Z) - EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks [24.41705039390567]
EmbodiedVSR (Embodied Visual Spatial Reasoning) is a novel framework that integrates dynamic scene graph-guided Chain-of-Thought (CoT) reasoning.<n>Our method enables zero-shot spatial reasoning without task-specific fine-tuning.<n>Experiments demonstrate that our framework significantly outperforms existing MLLM-based methods in accuracy and reasoning coherence.
arXiv Detail & Related papers (2025-03-14T05:06:07Z) - Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation [35.71602601385161]
We present a novel vision-language model (VLM)-based navigation framework.<n>Our approach enhances spatial reasoning and decision-making in long-horizon tasks.<n> Experimental results demonstrate that the proposed method surpasses previous state-of-the-art approaches in object navigation tasks.
arXiv Detail & Related papers (2025-02-20T04:41:40Z) - Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments [19.818370526976974]
Vision Language Navigation in Continuous Environments (VLN-CE) represents a frontier in embodied AI.
We introduce Cog-GA, a generative agent founded on large language models (LLMs) tailored for VLN-CE tasks.
Cog-GA employs a dual-pronged strategy to emulate human-like cognitive processes.
arXiv Detail & Related papers (2024-09-04T08:30:03Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Learning to Move with Affordance Maps [57.198806691838364]
The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent.
Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry.
We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.
arXiv Detail & Related papers (2020-01-08T04:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.