Related papers: Reinforcement Learning for Self-Improving Agent with Skill Library

Reinforcement Learning for Self-Improving Agent with Skill Library

URL: http://arxiv.org/abs/2512.17102v1
Date: Thu, 18 Dec 2025 21:58:19 GMT
Title: Reinforcement Learning for Self-Improving Agent with Skill Library
Authors: Jiongxiao Wang, Qiaojing Yan, Yawei Wang, Yijun Tian, Soumya Smruti Mishra, Zhichao Xu, Megha Gandhi, Panpan Xu, Lin Lee Cheong,
Abstract summary: Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions.<n>One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills.<n>We propose a Reinforcement Learning (RL)-based approach to enhance agents' self-improvement capabilities with a skill library.
Score: 14.717149089634718
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deployed in new environments. One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills. However, current skill library approaches rely primarily on LLM prompting, making consistent skill library implementation challenging. To overcome these challenges, we propose a Reinforcement Learning (RL)-based approach to enhance agents' self-improvement capabilities with a skill library. Specifically, we introduce Skill Augmented GRPO for self-Evolution (SAGE), a novel RL framework that systematically incorporates skills into learning. The framework's key component, Sequential Rollout, iteratively deploys agents across a chain of similar tasks for each rollout. As agents navigate through the task chain, skills generated from previous tasks accumulate in the library and become available for subsequent tasks. Additionally, the framework enhances skill generation and utilization through a Skill-integrated Reward that complements the original outcome-based rewards. Experimental results on AppWorld demonstrate that SAGE, when applied to supervised-finetuned model with expert experience, achieves 8.9% higher Scenario Goal Completion while requiring 26% fewer interaction steps and generating 59% fewer tokens, substantially outperforming existing approaches in both accuracy and efficiency.

Related papers

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward [5.124116559484265]
The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice.<n>Rather than encoding all procedural knowledge within model weights, agent skills enable dynamic capability extension without retraining.<n>This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months.
arXiv Detail & Related papers (2026-02-12T21:33:25Z)
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [83.98129545309277]
We propose SkillRL, a framework that bridges the gap between raw experience and policy improvement.<n>Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank.<n> Experimental results on ALF, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-09T03:17:17Z)
Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling [66.0871543682453]
We present Omni-Thinker, a unified reinforcement learning framework that scales large language models across diverse tasks.<n>Our scheduler orders tasks according to accuracy backward transfer (BWT), reducing forgetting and improving multi-task performance.
arXiv Detail & Related papers (2025-07-20T01:50:16Z)
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability [106.35604230971396]
Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning.<n>To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch.<n>In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans.<n>After that, the model is trained on downstream tasks to achieve further improvement.
arXiv Detail & Related papers (2025-05-26T17:58:50Z)
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks [110.20297293596005]
Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks.<n>Existing multi-turn RL algorithms for optimizing LLM agents fail to perform effective credit assignment over multiple turns while leveraging the generalization capabilities of LLMs.<n>We propose a novel RL algorithm, SWEET-RL, that uses a carefully designed optimization objective to train a critic model with access to additional training-time information.<n>Our experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success and win rates on ColBench compared to other state-of-the-art multi-turn RL algorithms.
arXiv Detail & Related papers (2025-03-19T17:55:08Z)
Improving Retrospective Language Agents via Joint Policy Gradient Optimization [57.35348425288859]
RetroAct is a framework that jointly optimize both task-planning and self-reflective evolution capabilities in language agents.<n>We develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning.<n>We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
arXiv Detail & Related papers (2025-03-03T12:54:54Z)
Skill Expansion and Composition in Parameter Space [17.016614374151747]
Parametric Skill Expansion and Composition (PSEC) is a new framework designed to iteratively evolve the agents' capabilities.<n>PSEC exhibits superior capacity to leverage prior knowledge to efficiently tackle new challenges.
arXiv Detail & Related papers (2025-02-09T15:22:38Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Agentic Skill Discovery [19.5703917813767]
Language-conditioned robotic skills make it possible to apply the high-level reasoning of Large Language Models (LLMs) to low-level robotic control. A remaining challenge is to acquire a diverse set of fundamental skills. We introduce a novel framework for skill discovery that is entirely driven by LLMs.
arXiv Detail & Related papers (2024-05-23T19:44:03Z)
Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills [25.326624139426514]
We propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Vari Curriculum Curriculum (VUVC) We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup.
arXiv Detail & Related papers (2023-10-30T10:34:25Z)
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance [66.615355754712]
BOSS learns to accomplish new tasks by performing "skill bootstrapping" We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping.
arXiv Detail & Related papers (2023-10-16T02:43:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.