Related papers: Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration

Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration

URL: http://arxiv.org/abs/2601.10744v1
Date: Sun, 11 Jan 2026 16:23:22 GMT
Title: Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration
Authors: Sen Wang, Bangwei Liu, Zhenkun Gao, Lizhuang Ma, Xuhong Wang, Yuan Xie, Xin Tan,
Abstract summary: Long-term Memory Embodied Exploration aims to unify the agent's exploratory cognition and decision-making behaviors.<n>To enhance the agent's memory recall and proactive exploration capabilities, we propose MemoryExplorer.
Score: 52.35887679314727
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An ideal embodied agent should possess lifelong learning capabilities to handle long-horizon and complex tasks, enabling continuous operation in general environments. This not only requires the agent to accurately accomplish given tasks but also to leverage long-term episodic memory to optimize decision-making. However, existing mainstream one-shot embodied tasks primarily focus on task completion results, neglecting the crucial process of exploration and memory utilization. To address this, we propose Long-term Memory Embodied Exploration (LMEE), which aims to unify the agent's exploratory cognition and decision-making behaviors to promote lifelong learning.We further construct a corresponding dataset and benchmark, LMEE-Bench, incorporating multi-goal navigation and memory-based question answering to comprehensively evaluate both the process and outcome of embodied exploration. To enhance the agent's memory recall and proactive exploration capabilities, we propose MemoryExplorer, a novel method that fine-tunes a multimodal large language model through reinforcement learning to encourage active memory querying. By incorporating a multi-task reward function that includes action prediction, frontier selection, and question answering, our model achieves proactive exploration. Extensive experiments against state-of-the-art embodied exploration models demonstrate that our approach achieves significant advantages in long-horizon embodied tasks.

Related papers

Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management [63.48041801851891]
Fine-Mem is a unified framework designed for fine-grained feedback alignment.<n> Experiments on Memalpha and MemoryAgentBench demonstrate that Fine-Mem consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-01-13T11:06:17Z)
Vision to Geometry: 3D Spatial Memory for Sequential Embodied MLLM Reasoning and Exploration [12.928422281441968]
Embodied tasks typically require agents to actively explore unknown environments and reason about the scene to achieve a specific goal.<n>When deployed in real life, agents often face sequential tasks, where each new sub-task follows the completion of the previous one.<n>We introduce SEER-Bench, a new Sequential Embodied Exploration and Reasoning Benchmark encompassing two classic embodied tasks.<n>We propose 3DSPMR, a 3D SPatial Memory Reasoning approach that exploits relational, visual, and geometric cues from explored regions.
arXiv Detail & Related papers (2025-12-02T06:35:30Z)
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory [89.65731902036669]
Evo-Memory is a streaming benchmark and framework for evaluating self-evolving memory in large language model (LLM) agents.<n>We evaluate over ten representative memory modules and evaluate them across 10 diverse multi-turn goal-oriented and single-turn reasoning and QA datasets.
arXiv Detail & Related papers (2025-11-25T21:08:07Z)
FindingDory: A Benchmark to Evaluate Memory in Embodied Agents [49.18498389833308]
We introduce a new benchmark for long-range embodied tasks in the Habitat simulator.<n>This benchmark evaluates memory-based capabilities across 60 tasks requiring sustained engagement and contextual awareness.
arXiv Detail & Related papers (2025-06-18T17:06:28Z)
MemInsight: Autonomous Memory Augmentation for LLM Agents [12.620141762922168]
We propose an autonomous memory augmentation approach, MemInsight, to enhance semantic data representation and retrieval mechanisms.<n>We empirically validate the efficacy of our proposed approach in three task scenarios; conversational recommendation, question answering and event summarization.
arXiv Detail & Related papers (2025-03-27T17:57:28Z)
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model [39.169389255970806]
HiAgent is a framework that leverages subgoals as memory chunks to manage the working memory of Large Language Model (LLM)-based agents hierarchically. Results show that HiAgent achieves a twofold increase in success rate and reduces the average number of steps required by 3.8.
arXiv Detail & Related papers (2024-08-18T17:59:49Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.