STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning
- URL: http://arxiv.org/abs/2502.10177v1
- Date: Fri, 14 Feb 2025 14:12:09 GMT
- Title: STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning
- Authors: Mingcong Lei, Yiming Zhao, Ge Wang, Zhixin Mai, Shuguang Cui, Yatong Han, Jinke Ren,
- Abstract summary: We propose the S-Temporal Memory Agent (STMA), a framework designed to enhance planning and execution by integratingtemporal memory.
We evaluate STMA in theTextWorld environment on 32 tasks, involving multi-step planning and exploration under complexity levels of4.7%.
Experimental results demonstrate that STMA achieves a 31.25% success rate and a 24.7% increase in average score compared to the state-of-the-art model.
- Score: 36.70014527951141
- License:
- Abstract: A key objective of embodied intelligence is enabling agents to perform long-horizon tasks in dynamic environments while maintaining robust decision-making and adaptability. To achieve this goal, we propose the Spatio-Temporal Memory Agent (STMA), a novel framework designed to enhance task planning and execution by integrating spatio-temporal memory. STMA is built upon three critical components: (1) a spatio-temporal memory module that captures historical and environmental changes in real time, (2) a dynamic knowledge graph that facilitates adaptive spatial reasoning, and (3) a planner-critic mechanism that iteratively refines task strategies. We evaluate STMA in the TextWorld environment on 32 tasks, involving multi-step planning and exploration under varying levels of complexity. Experimental results demonstrate that STMA achieves a 31.25% improvement in success rate and a 24.7% increase in average score compared to the state-of-the-art model. The results highlight the effectiveness of spatio-temporal memory in advancing the memory capabilities of embodied agents.
Related papers
- Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning [41.94295877935867]
We introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL.
We also develop MIKASA-Robo, a benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation.
Our contributions establish a unified framework for advancing memory RL research, driving the development of more reliable systems for real-world applications.
arXiv Detail & Related papers (2025-02-14T20:46:19Z) - On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability [59.72892401927283]
We evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks.
Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints.
arXiv Detail & Related papers (2024-09-30T03:58:43Z) - Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems [12.461941212597877]
Embodied AI agents often face difficulties with in-context memory, leading to inefficiencies and errors in task execution.
We introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules.
This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning.
arXiv Detail & Related papers (2024-09-23T11:02:46Z) - AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation [81.32722475387364]
Large Language Model-based agents have garnered significant attention and are becoming increasingly popular.
Planning ability is a crucial component of an LLM-based agent, which generally entails achieving a desired goal from an initial state.
Recent studies have demonstrated that utilizing expert-level trajectory for instruction-tuning LLMs effectively enhances their planning capabilities.
arXiv Detail & Related papers (2024-08-01T17:59:46Z) - EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024 [50.89751993430737]
We introduce a novel planning framework which comprises three stages: long-term memory Extraction, context-awared Planning, and multi-iteration Decision, named EPD.
EPD achieves a planning accuracy of 53.85% over 1,584 egocentric task planning questions.
arXiv Detail & Related papers (2024-07-28T15:14:07Z) - Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments [13.163784646113214]
Continual Test-Time Adaptation (CTTA) has recently emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains.
We present AMROD, featuring three core components. Firstly, the object-level contrastive learning module extracts object-level features for contrastive learning to refine the feature representation in the target domain.
Secondly, the adaptive monitoring module dynamically skips unnecessary adaptation and updates the category-specific threshold based on predicted confidence scores to enable efficiency and improve the quality of pseudo-labels.
arXiv Detail & Related papers (2024-06-24T08:30:03Z) - Synergising Human-like Responses and Machine Intelligence for Planning in Disaster Response [10.294618771570985]
We propose an attention-based cognitive architecture inspired by Dual Process Theory (DPT)
This framework integrates, in an online fashion, rapid yet (human-like) responses with the slow but optimized planning capabilities of machine intelligence.
arXiv Detail & Related papers (2024-04-15T15:47:08Z) - Enabling Visual Action Planning for Object Manipulation through Latent
Space Roadmap [72.01609575400498]
We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces.
We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space.
We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.
arXiv Detail & Related papers (2021-03-03T17:48:26Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.