KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems
- URL: http://arxiv.org/abs/2409.14908v1
- Date: Mon, 23 Sep 2024 11:02:46 GMT
- Title: KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems
- Authors: Zixuan Wang, Bo Yu, Junzhe Zhao, Wenhao Sun, Sai Hou, Shuai Liang, Xing Hu, Yinhe Han, Yiming Gan,
- Abstract summary: Embodied AI agents often face difficulties with in-context memory, leading to inefficiencies and errors in task execution.
We introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules.
This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning.
- Score: 12.461941212597877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embodied AI agents responsible for executing interconnected, long-sequence household tasks often face difficulties with in-context memory, leading to inefficiencies and errors in task execution. To address this issue, we introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules, enhancing large language models (LLMs) for planning in embodied agents through memory-augmented prompting. KARMA distinguishes between long-term and short-term memory, with long-term memory capturing comprehensive 3D scene graphs as representations of the environment, while short-term memory dynamically records changes in objects' positions and states. This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning. Short-term memory employs strategies for effective and adaptive memory replacement, ensuring the retention of critical information while discarding less pertinent data. Compared to state-of-the-art embodied agents enhanced with memory, our memory-augmented embodied AI agent improves success rates by 1.3x and 2.3x in Composite Tasks and Complex Tasks within the AI2-THOR simulator, respectively, and enhances task execution efficiency by 3.4x and 62.7x. Furthermore, we demonstrate that KARMA's plug-and-play capability allows for seamless deployment on real-world robotic systems, such as mobile manipulation platforms.Through this plug-and-play memory system, KARMA significantly enhances the ability of embodied agents to generate coherent and contextually appropriate plans, making the execution of complex household tasks more efficient. The experimental videos from the work can be found at https://youtu.be/4BT7fnw9ehs.
Related papers
- Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
Current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term.
We introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents.
Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks.
arXiv Detail & Related papers (2024-10-14T03:50:17Z) - MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD [27.472705540825316]
This paper is on long-term video understanding where the goal is to recognise human actions over long temporal windows (up to minutes long)
We propose an alternative to attention-based schemes which is based on a low-rank approximation of the memory obtained using Singular Value Decomposition.
Our scheme has two advantages: (a) it reduces complexity by more than an order of magnitude, and (b) it is amenable to an efficient implementation for the calculation of the memory bases.
arXiv Detail & Related papers (2024-06-11T12:03:57Z) - Mastering Memory Tasks with World Models [12.99255437732525]
Current model-based reinforcement learning (MBRL) agents struggle with long-term dependencies.
We present a new method, Recall to Imagine (R2I), to improve temporal coherence.
R2I establishes a new state-of-the-art for challenging memory and credit assignment RL tasks.
arXiv Detail & Related papers (2024-03-07T06:35:59Z) - Think Before You Act: Decision Transformers with Working Memory [44.18926449252084]
Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks.
We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training.
We propose a working memory module to store, blend, and retrieve information for different downstream tasks.
arXiv Detail & Related papers (2023-05-24T01:20:22Z) - Enhancing Large Language Model with Self-Controlled Memory Framework [56.38025154501917]
Large Language Models (LLMs) are constrained by their inability to process lengthy inputs, resulting in the loss of critical historical information.
We propose the Self-Controlled Memory (SCM) framework to enhance the ability of LLMs to maintain long-term memory and recall relevant information.
arXiv Detail & Related papers (2023-04-26T07:25:31Z) - Evaluating Long-Term Memory in 3D Mazes [10.224858246626171]
Memory Maze is a 3D domain of randomized mazes designed for evaluating long-term memory in agents.
Unlike existing benchmarks, Memory Maze measures long-term memory separate from confounding agent abilities.
We find that current algorithms benefit from training with truncated backpropagation through time and succeed on small mazes, but fall short of human performance on the large mazes.
arXiv Detail & Related papers (2022-10-24T16:32:28Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - Not All Memories are Created Equal: Learning to Forget by Expiring [49.053569908417636]
We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant information.
This forgetting of memories enables Transformers to scale to attend over tens of thousands of previous timesteps efficiently.
We show that Expire-Span can scale to memories that are tens of thousands in size, setting a new state of the art on incredibly long context tasks.
arXiv Detail & Related papers (2021-05-13T20:50:13Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - Augmented Replay Memory in Reinforcement Learning With Continuous
Control [1.6752182911522522]
Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions.
This expansion increases the agent's state space enabling it to scale up to a more complex problems but also increases the risk of forgetting by learning on redundant or conflicting data.
To improve the approximation of a large amount of data, a random mini-batch of the past experiences that are stored in the replay memory buffer is often replayed at each learning step.
arXiv Detail & Related papers (2019-12-29T20:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.