Related papers: Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale

Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale

URL: http://arxiv.org/abs/2603.02176v1
Date: Mon, 02 Mar 2026 18:46:47 GMT
Title: Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale
Authors: Hao Li, Chunjiang Mu, Jianhao Chen, Siyue Ren, Zhiyao Cui, Yiqun Zhang, Lei Bai, Shuyue Hu,
Abstract summary: AgentSkillOS is a principled framework for skill selection, orchestration, and ecosystem-level management.<n>AgentSkillOS comprises two stages: (i) Manage Skills, which organizes skills into a capability tree.<n> (ii) Solve Tasks, which retrieves, orchestrates, and executes multiple skills through DAG-based pipelines.
Score: 28.43462779191672
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid proliferation of Claude agent skills has raised the central question of how to effectively leverage, manage, and scale the agent skill ecosystem. In this paper, we propose AgentSkillOS, the first principled framework for skill selection, orchestration, and ecosystem-level management. AgentSkillOS comprises two stages: (i) Manage Skills, which organizes skills into a capability tree via node-level recursive categorization for efficient discovery; and (ii) Solve Tasks, which retrieves, orchestrates, and executes multiple skills through DAG-based pipelines. To evaluate the agent's ability to invoke skills, we construct a benchmark of 30 artifact-rich tasks across five categories: data computation, document creation, motion video, visual design, and web interaction. We assess the quality of task outputs using LLM-based pairwise evaluation, and the results are aggregated via a Bradley-Terry model to produce unified quality scores. Experiments across three skill ecosystem scales (200 to 200K skills) show that tree-based retrieval effectively approximates oracle skill selection, and that DAG-based orchestration substantially outperforms native flat invocation even when given the identical skill set.Our findings confirm that structured composition is the key to unlocking skill potential. Our GitHub repository is available at:https://github.com/ynulihao/AgentSkillOS.

Related papers

K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control [73.50217471850658]
K2-Agent is a hierarchical framework that models human-like cognition by knowing and co-evolving declarative (what) and procedural (how) knowledge for planning and execution.<n>On the challenging AndroidWorld benchmark, K2-Agent achieves a 76.1% success rate using only raw and open-source backbones.
arXiv Detail & Related papers (2026-02-28T14:33:14Z)
SkillNet: Create, Evaluate, and Connect AI Skills [159.47504178122156]
SkillNet is an open infrastructure designed to create, evaluate, and organize AI skills at scale.<n>Our infrastructure integrates a repository of over 200,000 skills, an interactive platform, and a versatile Python toolkit.
arXiv Detail & Related papers (2026-02-26T14:24:02Z)
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents [6.356997609995175]
Agentic systems increasingly rely on reusable procedural capabilities, textita.k.a., agentic skills, to execute long-horizon reliably.<n>This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update)<n>We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution.
arXiv Detail & Related papers (2026-02-24T13:11:38Z)
SkillOrchestra: Learning to Route Agents via Skill Transfer [65.50924963973286]
We introduce SkillOrchestra, a framework for skill-aware orchestration.<n>SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills.<n>At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off.
arXiv Detail & Related papers (2026-02-23T10:17:25Z)
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [83.98129545309277]
We propose SkillRL, a framework that bridges the gap between raw experience and policy improvement.<n>Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank.<n> Experimental results on ALF, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-09T03:17:17Z)
Reinforcement Learning for Self-Improving Agent with Skill Library [14.717149089634718]
Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions.<n>One promising approach is implementing skill libraries that allow agents to learn, validate, and apply new skills.<n>We propose a Reinforcement Learning (RL)-based approach to enhance agents' self-improvement capabilities with a skill library.
arXiv Detail & Related papers (2025-12-18T21:58:19Z)
eSapiens: A Platform for Secure and Auditable Retrieval-Augmented Generation [10.667949307405983]
eSapiens is an AI-as-a-Service (AI) platform engineered around a business-oriented trifecta: proprietary data, operational, and any major Large Language Model (LLM)<n>eSapiens gives businesses full control over their AI assets, keeping everything in-house for AI knowledge retention and data security.
arXiv Detail & Related papers (2025-07-13T11:41:44Z)
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities [56.646832992178105]
We introduce OmniBench, a cross-platform, graph-based benchmark with an automated pipeline for synthesizing tasks of controllable complexity.<n>We present OmniEval, a multidimensional evaluation framework that includes subtask-level evaluation, graph-based metrics, and comprehensive tests across 10 capabilities.<n>Our dataset contains 36k graph-structured tasks across 20 scenarios, achieving a 91% human acceptance rate.
arXiv Detail & Related papers (2025-06-10T15:59:38Z)
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search [58.98450205734779]
Large language model (LLM) agents have demonstrated strong capabilities across diverse domains.<n>Existing agent search methods suffer from three major limitations.<n>We introduce a comprehensive framework to address these challenges.
arXiv Detail & Related papers (2025-06-06T12:07:23Z)
Kolb-Based Experiential Learning for Generalist Agents with Human-Level Kaggle Data Science Performance [81.05882480184587]
We propose a computational framework of Kolb's learning cycle with Vygotsky's ZPD for autonomous agents.<n>Agent K is the 1st AI system to successfully integrate Kolb- and Vygotsky-inspired human cognitive learning.<n>With 9 gold, 8 silver, and 12 bronze medals level performance - including 4 gold and 4 silver on prize-awarding competitions - Agent K is the 1st AI system to successfully integrate Kolb- and Vygotsky-inspired human cognitive learning.
arXiv Detail & Related papers (2024-11-05T23:55:23Z)
Agents meet OKR: An Object and Key Results Driven Agent System with Hierarchical Self-Collaboration and Self-Evaluation [25.308341461293857]
OKR-Agent is designed to enhance the capabilities of Large Language Models (LLMs) in task-solving. Our framework includes two novel modules: hierarchical Objects and Key Results generation and multi-level evaluation.
arXiv Detail & Related papers (2023-11-28T06:16:30Z)
Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation [27.868736254566397]
We focus on how to learn additional feature representations for few-shot image classification through pretext tasks. This additional knowledge can further improve the performance of few-shot learning. We present a plug-in Hierarchical Tree Structure-aware (HTS) method, which learns the relationship of FSL and pretext tasks.
arXiv Detail & Related papers (2022-07-14T15:17:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.