STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning
- URL: http://arxiv.org/abs/2502.10177v1
- Date: Fri, 14 Feb 2025 14:12:09 GMT
- Title: STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning
- Authors: Mingcong Lei, Yiming Zhao, Ge Wang, Zhixin Mai, Shuguang Cui, Yatong Han, Jinke Ren,
- Abstract summary: We propose the S-Temporal Memory Agent (STMA), a framework designed to enhance planning and execution by integratingtemporal memory.<n>We evaluate STMA in theTextWorld environment on 32 tasks, involving multi-step planning and exploration under complexity levels of4.7%.<n> Experimental results demonstrate that STMA achieves a 31.25% success rate and a 24.7% increase in average score compared to the state-of-the-art model.
- Score: 36.70014527951141
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A key objective of embodied intelligence is enabling agents to perform long-horizon tasks in dynamic environments while maintaining robust decision-making and adaptability. To achieve this goal, we propose the Spatio-Temporal Memory Agent (STMA), a novel framework designed to enhance task planning and execution by integrating spatio-temporal memory. STMA is built upon three critical components: (1) a spatio-temporal memory module that captures historical and environmental changes in real time, (2) a dynamic knowledge graph that facilitates adaptive spatial reasoning, and (3) a planner-critic mechanism that iteratively refines task strategies. We evaluate STMA in the TextWorld environment on 32 tasks, involving multi-step planning and exploration under varying levels of complexity. Experimental results demonstrate that STMA achieves a 31.25% improvement in success rate and a 24.7% increase in average score compared to the state-of-the-art model. The results highlight the effectiveness of spatio-temporal memory in advancing the memory capabilities of embodied agents.
Related papers
- Occupancy Learning with Spatiotemporal Memory [39.41175479685905]
We propose a scene-level occupancy representation learning framework that effectively learns 3D occupancy feature with temporal consistency.<n>Our method significantly enhances thetemporal representation learned for 3D occupancy prediction tasks by exploiting the temporal dependency between multi-frame inputs.
arXiv Detail & Related papers (2025-08-06T17:59:52Z) - MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z) - FindingDory: A Benchmark to Evaluate Memory in Embodied Agents [49.89792845476579]
We introduce a new benchmark for long-range embodied tasks in the Habitat simulator.<n>This benchmark evaluates memory-based capabilities across 60 tasks requiring sustained engagement and contextual awareness.
arXiv Detail & Related papers (2025-06-18T17:06:28Z) - 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model [83.70640091897947]
Humans excel at performing complex tasks by leveraging long-term memory across temporal and spatial experiences.<n>Current Large Language Models (LLMs) struggle to effectively plan and act in dynamic, multi-room 3D environments.<n>We propose 3DLLM-Mem, a novel dynamic memory management and fusion model for embodied spatial-temporal reasoning and actions.
arXiv Detail & Related papers (2025-05-28T17:59:13Z) - REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation [57.628771707989166]
We propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution.
ReMAC incorporates two key modules: a self-reflection module performing pre-conditions and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning.
arXiv Detail & Related papers (2025-03-28T03:51:40Z) - Dynamic Attention Mechanism in Spatiotemporal Memory Networks for Object Tracking [8.040709469401257]
We propose a differentiable dynamic attention mechanism that adaptively channel adjusts attention weights by analyzing spatial attention weights.
A lightweight gating network that autonomously allocates computational resources based on target motion states, prioritizes high-discriminability features in challenging scenarios.
arXiv Detail & Related papers (2025-03-21T00:48:31Z) - DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.
It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.
It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - Structured Preference Optimization for Vision-Language Long-Horizon Task Planning [60.26885165189447]
Existing methods for vision-language task planning excel in short-horizon tasks but often fall short in complex, long-horizon planning within dynamic environments.
These challenges arise from the difficulty of effectively training models to produce high-quality reasoning processes for long-horizon tasks.
We propose Structured Preference Optimization (SPO), which aims to enhance reasoning and action selection in long-horizon task planning.
arXiv Detail & Related papers (2025-02-28T05:47:34Z) - On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability [59.72892401927283]
We evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks.
Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints.
arXiv Detail & Related papers (2024-09-30T03:58:43Z) - Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks.
It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z) - KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems [12.461941212597877]
Embodied AI agents often face difficulties with in-context memory, leading to inefficiencies and errors in task execution.
We introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules.
This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning.
arXiv Detail & Related papers (2024-09-23T11:02:46Z) - EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024 [50.89751993430737]
We introduce a novel planning framework which comprises three stages: long-term memory Extraction, context-awared Planning, and multi-iteration Decision, named EPD.
EPD achieves a planning accuracy of 53.85% over 1,584 egocentric task planning questions.
arXiv Detail & Related papers (2024-07-28T15:14:07Z) - Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments [13.163784646113214]
Continual Test-Time Adaptation (CTTA) has recently emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains.
We present AMROD, featuring three core components. Firstly, the object-level contrastive learning module extracts object-level features for contrastive learning to refine the feature representation in the target domain.
Secondly, the adaptive monitoring module dynamically skips unnecessary adaptation and updates the category-specific threshold based on predicted confidence scores to enable efficiency and improve the quality of pseudo-labels.
arXiv Detail & Related papers (2024-06-24T08:30:03Z) - Synergising Human-like Responses and Machine Intelligence for Planning in Disaster Response [10.294618771570985]
We propose an attention-based cognitive architecture inspired by Dual Process Theory (DPT)
This framework integrates, in an online fashion, rapid yet (human-like) responses with the slow but optimized planning capabilities of machine intelligence.
arXiv Detail & Related papers (2024-04-15T15:47:08Z) - Enabling Visual Action Planning for Object Manipulation through Latent
Space Roadmap [72.01609575400498]
We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces.
We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space.
We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.
arXiv Detail & Related papers (2021-03-03T17:48:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.