Related papers: LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners

LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners

URL: http://arxiv.org/abs/2505.11942v3
Date: Fri, 30 May 2025 02:28:21 GMT
Title: LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners
Authors: Junhao Zheng, Xidi Cai, Qiuke Li, Duzhen Zhang, ZhongZhi Li, Yingying Zhang, Le Song, Qianli Ma,
Abstract summary: Current large language model (LLM)-based agents, however, remain stateless and unable to accumulate or transfer knowledge over time.<n>We present LifelongAgentBench, the first unified benchmark designed to systematically assess the lifelong learning ability of LLM agents.
Score: 51.518410910148816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lifelong learning is essential for intelligent agents operating in dynamic environments. Current large language model (LLM)-based agents, however, remain stateless and unable to accumulate or transfer knowledge over time. Existing benchmarks treat agents as static systems and fail to evaluate lifelong learning capabilities. We present LifelongAgentBench, the first unified benchmark designed to systematically assess the lifelong learning ability of LLM agents. It provides skill-grounded, interdependent tasks across three interactive environments, Database, Operating System, and Knowledge Graph, with automatic label verification, reproducibility, and modular extensibility. Extensive experiments reveal that conventional experience replay has limited effectiveness for LLM agents due to irrelevant information and context length constraints. We further introduce a group self-consistency mechanism that significantly improves lifelong learning performance. We hope LifelongAgentBench will advance the development of adaptive, memory-capable LLM agents.

Related papers

Lifelong Learning of Large Language Model based Agents: A Roadmap [39.01532420650279]
Lifelong learning, also known as continual or incremental learning, is a crucial component for advancing Artificial General Intelligence (AGI)<n>This survey is the first to systematically summarize the potential techniques for incorporating lifelong learning into large language models (LLMs)<n>We highlight how these pillars collectively enable continuous adaptation, mitigate catastrophic forgetting, and improve long-term performance.
arXiv Detail & Related papers (2025-01-13T12:42:04Z)
RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models [5.0741409008225755]
Large language models (LLMs) have emerged as promising tools for solving challenging robotic tasks. Most existing LLM-based agents lack the ability to retain and learn from past interactions. We propose RAG-Modulo, a framework that enhances LLM-based agents with a memory of past interactions and incorporates critics to evaluate the agents' decisions.
arXiv Detail & Related papers (2024-09-18T20:03:32Z)
Empowering Large Language Model Agents through Action Learning [85.39581419680755]
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error. We argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions.
arXiv Detail & Related papers (2024-02-24T13:13:04Z)
Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development. We introduce Experiential Co-Learning, a novel LLM-agent learning framework. Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
ExpeL: LLM Agents Are Experiential Learners [57.13685954854463]
We introduce the Experiential Learning (ExpeL) agent to allow learning from agent experiences without requiring parametric updates.<n>Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks.<n>At inference, the agent recalls its extracted insights and past experiences to make informed decisions.
arXiv Detail & Related papers (2023-08-20T03:03:34Z)
AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z)
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning [64.55001982176226]
LIBERO is a novel benchmark of lifelong learning for robot manipulation. We focus on how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks.
arXiv Detail & Related papers (2023-06-05T23:32:26Z)
Lifelong Reinforcement Learning with Temporal Logic Formulas and Reward Machines [30.161550541362487]
We propose Lifelong reinforcement learning with Sequential linear temporal logic formulas and Reward Machines (LSRM) We first introduce Sequential Linear Temporal Logic (SLTL), which is a supplement to the existing Linear Temporal Logic formal language. We then utilize Reward Machines (RM) to exploit structural reward functions for tasks encoded with high-level events.
arXiv Detail & Related papers (2021-11-18T02:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.