MemoNav: Working Memory Model for Visual Navigation
- URL: http://arxiv.org/abs/2402.19161v2
- Date: Thu, 28 Mar 2024 04:07:57 GMT
- Title: MemoNav: Working Memory Model for Visual Navigation
- Authors: Hongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang,
- Abstract summary: Image-goal navigation is a challenging task that requires an agent to navigate to a goal indicated by an image in unfamiliar environments.
Existing methods utilizing diverse scene memories suffer from inefficient exploration since they use all historical observations for decision-making.
We present MemoNav, a novel memory model for image-goal navigation, which utilizes a working memory-inspired pipeline to improve navigation performance.
- Score: 47.011190883888446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image-goal navigation is a challenging task that requires an agent to navigate to a goal indicated by an image in unfamiliar environments. Existing methods utilizing diverse scene memories suffer from inefficient exploration since they use all historical observations for decision-making without considering the goal-relevant fraction. To address this limitation, we present MemoNav, a novel memory model for image-goal navigation, which utilizes a working memory-inspired pipeline to improve navigation performance. Specifically, we employ three types of navigation memory. The node features on a map are stored in the short-term memory (STM), as these features are dynamically updated. A forgetting module then retains the informative STM fraction to increase efficiency. We also introduce long-term memory (LTM) to learn global scene representations by progressively aggregating STM features. Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation. The synergy among these three memory types boosts navigation performance by enabling the agent to learn and leverage goal-relevant scene features within a topological map. Our evaluation on multi-goal tasks demonstrates that MemoNav significantly outperforms previous methods across all difficulty levels in both Gibson and Matterport3D scenes. Qualitative results further illustrate that MemoNav plans more efficient routes.
Related papers
- Memory Proxy Maps for Visual Navigation [6.1190419149081245]
Visual navigation takes inspiration from humans, who navigate in previously unseen environments using vision without detailed environment maps.
Inspired by this, we introduce a novel no-RL, no-graph, no-odometry approach to visual navigation using feudal learning to build a three tiered agent.
arXiv Detail & Related papers (2024-11-15T02:37:14Z) - GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation [65.71524410114797]
GOAT-Bench is a benchmark for the universal navigation task GO to AnyThing (GOAT)
In GOAT, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image.
We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities.
arXiv Detail & Related papers (2024-04-09T20:40:00Z) - GaussNav: Gaussian Splatting for Visual Navigation [92.13664084464514]
Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment.
Our framework constructs a novel map representation based on 3D Gaussian Splatting (3DGS)
Our framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.
arXiv Detail & Related papers (2024-03-18T09:56:48Z) - ESceme: Vision-and-Language Navigation with Episodic Scene Memory [72.69189330588539]
Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.
We introduce a mechanism of Episodic Scene memory (ESceme) for VLN that wakes an agent's memories of past visits when it enters the current scene.
arXiv Detail & Related papers (2023-03-02T07:42:07Z) - MemoNav: Selecting Informative Memories for Visual Navigation [43.185016165039116]
We present the MemoNav, a novel memory mechanism for image-goal navigation.
The MemoNav retains the agent's informative short-term memory and long-term memory to improve the navigation performance.
We evaluate our model on a new multi-goal navigation dataset.
arXiv Detail & Related papers (2022-08-20T05:57:21Z) - Object Memory Transformer for Object Goal Navigation [10.359616364592075]
This paper presents a reinforcement learning method for object goal navigation (Nav)
An agent navigates in 3D indoor environments to reach a target object based on long-term observations of objects and scenes.
To the best of our knowledge, this is the first work that uses a long-term memory of object semantics in a goal-oriented navigation task.
arXiv Detail & Related papers (2022-03-24T09:16:56Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.