Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
- URL: http://arxiv.org/abs/2510.08002v1
- Date: Thu, 09 Oct 2025 09:40:34 GMT
- Title: Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
- Authors: Cheng Yang, Xuemeng Yang, Licheng Wen, Daocheng Fu, Jianbiao Mei, Rong Wu, Pinlong Cai, Yufan Shen, Nianchen Deng, Botian Shi, Yu Qiao, Haifeng Li,
- Abstract summary: Large Language Models have demonstrated remarkable capabilities across diverse domains, yet significant challenges persist when deploying them as AI agents for real-world long-horizon tasks.<n>Existing LLM agents suffer from a critical limitation: they are test-time static and cannot learn from experience, lacking the ability to accumulate knowledge and continuously improve on the job.<n>We propose MUSE, a novel agent framework that introduces an experience-driven, self-evolving system centered around a hierarchical Memory Module.
- Score: 42.78572295558531
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models have demonstrated remarkable capabilities across diverse domains, yet significant challenges persist when deploying them as AI agents for real-world long-horizon tasks. Existing LLM agents suffer from a critical limitation: they are test-time static and cannot learn from experience, lacking the ability to accumulate knowledge and continuously improve on the job. To address this challenge, we propose MUSE, a novel agent framework that introduces an experience-driven, self-evolving system centered around a hierarchical Memory Module. MUSE organizes diverse levels of experience and leverages them to plan and execute long-horizon tasks across multiple applications. After each sub-task execution, the agent autonomously reflects on its trajectory, converting the raw trajectory into structured experience and integrating it back into the Memory Module. This mechanism enables the agent to evolve beyond its static pretrained parameters, fostering continuous learning and self-evolution. We evaluate MUSE on the long-horizon productivity benchmark TAC. It achieves new SOTA performance by a significant margin using only a lightweight Gemini-2.5 Flash model. Sufficient Experiments demonstrate that as the agent autonomously accumulates experience, it exhibits increasingly superior task completion capabilities, as well as robust continuous learning and self-evolution capabilities. Moreover, the accumulated experience from MUSE exhibits strong generalization properties, enabling zero-shot improvement on new tasks. MUSE establishes a new paradigm for AI agents capable of real-world productivity task automation.
Related papers
- Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark [57.59000694149105]
We introduce Experience-driven Lifelong Learning (ELL), a framework for building self-evolving agents.<n>ELL is built on four core principles: Experience Exploration, Long-term Memory, Skill Learning and Knowledge Internalization.<n>We also introduce StuLife, a benchmark dataset for ELL that simulates a student's holistic college journey.
arXiv Detail & Related papers (2025-08-26T13:04:28Z) - LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners [51.518410910148816]
Current large language model (LLM)-based agents, however, remain stateless and unable to accumulate or transfer knowledge over time.<n>We present LifelongAgentBench, the first unified benchmark designed to systematically assess the lifelong learning ability of LLM agents.
arXiv Detail & Related papers (2025-05-17T10:09:11Z) - Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development.
We introduce Experiential Co-Learning, a novel LLM-agent learning framework.
Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z) - Large Language Models Are Semi-Parametric Reinforcement Learning Agents [15.908831573619842]
REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals.
Reinforcement Learning with Experience Memory (RLEM) is introduced to update the memory.
Experiments are conducted on two RL task sets to evaluate the proposed framework.
arXiv Detail & Related papers (2023-06-09T08:08:18Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.