MemoNav: Selecting Informative Memories for Visual Navigation
- URL: http://arxiv.org/abs/2208.09610v1
- Date: Sat, 20 Aug 2022 05:57:21 GMT
- Title: MemoNav: Selecting Informative Memories for Visual Navigation
- Authors: Hongxin Li, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang
- Abstract summary: We present the MemoNav, a novel memory mechanism for image-goal navigation.
The MemoNav retains the agent's informative short-term memory and long-term memory to improve the navigation performance.
We evaluate our model on a new multi-goal navigation dataset.
- Score: 43.185016165039116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image-goal navigation is a challenging task, as it requires the agent to
navigate to a target indicated by an image in a previously unseen scene.
Current methods introduce diverse memory mechanisms which save navigation
history to solve this task. However, these methods use all observations in the
memory for generating navigation actions without considering which fraction of
this memory is informative. To address this limitation, we present the MemoNav,
a novel memory mechanism for image-goal navigation, which retains the agent's
informative short-term memory and long-term memory to improve the navigation
performance on a multi-goal task. The node features on the agent's topological
map are stored in the short-term memory, as these features are dynamically
updated. To aid the short-term memory, we also generate long-term memory by
continuously aggregating the short-term memory via a graph attention module.
The MemoNav retains the informative fraction of the short-term memory via a
forgetting module based on a Transformer decoder and then incorporates this
retained short-term memory and the long-term memory into working memory.
Lastly, the agent uses the working memory for action generation. We evaluate
our model on a new multi-goal navigation dataset. The experimental results show
that the MemoNav outperforms the SoTA methods by a large margin with a smaller
fraction of navigation history. The results also empirically show that our
model is less likely to be trapped in a deadlock, which further validates that
the MemoNav improves the agent's navigation efficiency by reducing redundant
steps.
Related papers
- KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems [12.461941212597877]
Embodied AI agents often face difficulties with in-context memory, leading to inefficiencies and errors in task execution.
We introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules.
This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning.
arXiv Detail & Related papers (2024-09-23T11:02:46Z) - MemoNav: Working Memory Model for Visual Navigation [47.011190883888446]
Image-goal navigation is a challenging task that requires an agent to navigate to a goal indicated by an image in unfamiliar environments.
Existing methods utilizing diverse scene memories suffer from inefficient exploration since they use all historical observations for decision-making.
We present MemoNav, a novel memory model for image-goal navigation, which utilizes a working memory-inspired pipeline to improve navigation performance.
arXiv Detail & Related papers (2024-02-29T13:45:13Z) - Evaluating Long-Term Memory in 3D Mazes [10.224858246626171]
Memory Maze is a 3D domain of randomized mazes designed for evaluating long-term memory in agents.
Unlike existing benchmarks, Memory Maze measures long-term memory separate from confounding agent abilities.
We find that current algorithms benefit from training with truncated backpropagation through time and succeed on small mazes, but fall short of human performance on the large mazes.
arXiv Detail & Related papers (2022-10-24T16:32:28Z) - XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin
Memory Model [137.50614198301733]
We present XMem, a video object segmentation architecture for long videos with unified feature memory stores.
We develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores.
XMem greatly exceeds state-of-the-art performance on long-video datasets.
arXiv Detail & Related papers (2022-07-14T17:59:37Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Multimodal Transformer with Variable-length Memory for
Vision-and-Language Navigation [79.1669476932147]
Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position.
Recent Transformer-based VLN methods have made great progress benefiting from the direct connections between visual observations and the language instruction.
We introduce Multimodal Transformer with Variable-length Memory (MTVM) for visually-grounded natural language navigation.
arXiv Detail & Related papers (2021-11-10T16:04:49Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z) - Transfer between long-term and short-term memory using Conceptors [0.0]
We introduce a recurrent neural network model of working memory combining short-term and long-term components.
We show how standard operations on conceptors allow to combine long-term memories and describe their effect on short-term memory.
arXiv Detail & Related papers (2020-03-11T09:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.