Related papers: LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

URL: http://arxiv.org/abs/2510.24118v1
Date: Tue, 28 Oct 2025 06:42:21 GMT
Title: LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation
Authors: Haotian Zhou, Xiaole Wang, He Li, Fusheng Sun, Shengyu Guo, Guolei Qi, Jianghuan Xu, Huijing Zhao,
Abstract summary: LagMemo is a navigation system for multi-modal, open-vocabulary goal queries and multi-goal visual navigation.<n>During exploration, LagMemo constructs a unified 3D language memory.<n>With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism.
Score: 8.948489682917732
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During exploration, LagMemo constructs a unified 3D language memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo

Related papers

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers [54.661157616245966]
Open-world navigation requires robots to make decisions in complex everyday environments.<n>Recent advances in vision--language navigation (VLN) and vision--language--action (VLA) models enable end-to-end policies conditioned on natural language.<n>We propose OpenFrontier, a training-free navigation framework that seamlessly integrates diverse vision--language prior models.
arXiv Detail & Related papers (2026-03-05T17:02:22Z)
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies [54.23445842621374]
Memory is critical for long-horizon and history-dependent robotic manipulation.<n>Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms.<n>We introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models.
arXiv Detail & Related papers (2026-03-04T21:59:32Z)
3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting [12.057873540714098]
3DGSNav is a novel framework that embeds 3D Gaussian Splatting (3DGS) as persistent memory for vision-language models (VLMs) to enhance spatial reasoning.<n>3DGSNav incrementally constructs a 3DGS representation of the environment, enabling trajectory-guided free-viewpoint rendering of frontier-aware first-person views.<n>During navigation, a real-time object detector filters potential targets, while VLM-driven active viewpoint switching performs target re-verification.
arXiv Detail & Related papers (2026-02-12T16:41:26Z)
MetaMem: Evolving Meta-Memory for Knowledge Utilization through Self-Reflective Symbolic Optimization [57.17751568928966]
We propose MetaMem, a framework that augments memory systems with a self-evolving meta-memory.<n>During meta-memory optimization, MetaMem iteratively distills transferable knowledge utilization experiences across different tasks.<n>Extensive experiments demonstrate the effectiveness of MetaMem, which significantly outperforms strong baselines by over 3.6%.
arXiv Detail & Related papers (2026-01-27T04:46:23Z)
JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation [22.956416709470503]
Vision-and-Language Navigation requires an embodied agent to navigate through unseen environments, guided by natural language instructions and a continuous video stream.<n>Recent advances in VLN have been driven by the powerful semantic understanding of Multimodal Large Language Models.<n>We propose JanusVLN, a novel VLN framework featuring a dual implicit neural memory that models spatial-geometric and visual-semantic memory as separate, compact, and fixed-size neural representations.
arXiv Detail & Related papers (2025-09-26T16:29:37Z)
MLFM: Multi-Layered Feature Maps for Richer Language Understanding in Zero-Shot Semantic Navigation [25.63797039823049]
LangNav is an open-vocabulary multi-object navigation dataset with natural language goal descriptions.<n> MLFM builds a queryable, multi-layered semantic map from pretrained vision-language features.<n>Experiments on LangNav show that MLFM outperforms state-of-the-art zero-shot mapping-based navigation baselines.
arXiv Detail & Related papers (2025-07-09T21:46:43Z)
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation [65.71524410114797]
GOAT-Bench is a benchmark for the universal navigation task GO to AnyThing (GOAT) In GOAT, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image. We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities.
arXiv Detail & Related papers (2024-04-09T20:40:00Z)
MemoNav: Working Memory Model for Visual Navigation [47.011190883888446]
Image-goal navigation is a challenging task that requires an agent to navigate to a goal indicated by an image in unfamiliar environments. Existing methods utilizing diverse scene memories suffer from inefficient exploration since they use all historical observations for decision-making. We present MemoNav, a novel memory model for image-goal navigation, which utilizes a working memory-inspired pipeline to improve navigation performance.
arXiv Detail & Related papers (2024-02-29T13:45:13Z)
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation. It incorporates environmental feedback for refining future plans and adjusting its actions. It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z)
Vision-Dialog Navigation by Exploring Cross-modal Memory [107.13970721435571]
Vision-dialog navigation posed as a new holy-grail task in vision-language disciplinary targets. We propose the Cross-modal Memory Network (CMN) for remembering and understanding the rich information relevant to historical navigation actions. Our CMN outperforms the previous state-of-the-art model by a significant margin on both seen and unseen environments.
arXiv Detail & Related papers (2020-03-15T03:08:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.