Related papers: SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

URL: http://arxiv.org/abs/2602.20867v1
Date: Tue, 24 Feb 2026 13:11:38 GMT
Title: SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
Authors: Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, Guangsheng Yu,
Abstract summary: Agentic systems increasingly rely on reusable procedural capabilities, textita.k.a., agentic skills, to execute long-horizon reliably.<n>This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update)<n>We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution.
Score: 6.356997609995175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that package procedural knowledge with explicit applicability conditions, execution policies, termination criteria, and reusable interfaces. Unlike one-off plans or atomic tool calls, skills operate (and often do well) across tasks. This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update) and introduces two complementary taxonomies. The first is a system-level set of \textbf{seven design patterns} capturing how skills are packaged and executed in practice, from metadata-driven progressive disclosure and executable code skills to self-evolving libraries and marketplace distribution. The second is an orthogonal \textbf{representation $\times$ scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.

Related papers

EvoSkill: Automated Skill Discovery for Multi-Agent Systems [6.319876096746374]
We introduce textbfEvoSkill, a self-evolving framework that automatically discovers and refines agent skills.<n>EvoSkill analyzes execution failures, proposes new skills or edits to existing ones, and materializes them into structured, reusable skill folders.<n>We evaluate EvoSkill on two benchmarks: OfficeQA, a grounded reasoning benchmark over U.S. Treasury data, and SealQA, a noisy retrieval benchmark.
arXiv Detail & Related papers (2026-03-03T09:07:22Z)
SkillCraft: Can LLM Agents Learn to Use Tools Skillfully? [67.69996753743129]
We introduce SkillCraft, a benchmark explicitly stress-test agent ability to form and reuse higher-level tool compositions.<n> SkillCraft features realistic, highly compositional tool-use scenarios with difficulty scaled along both quantitative and structural dimensions.<n>We propose a lightweight evaluation protocol that enables agents to auto-compose atomic tools into executable Skills, cache and reuse them inside and across tasks.
arXiv Detail & Related papers (2026-02-28T15:44:31Z)
SkillNet: Create, Evaluate, and Connect AI Skills [159.47504178122156]
SkillNet is an open infrastructure designed to create, evaluate, and organize AI skills at scale.<n>Our infrastructure integrates a repository of over 200,000 skills, an interactive platform, and a versatile Python toolkit.
arXiv Detail & Related papers (2026-02-26T14:24:02Z)
Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments [14.079091139464175]
This work introduces a formal mathematical definition of the Agent Skill process, followed by a systematic evaluation of language models of varying sizes.<n>Results show that tiny models struggle with reliable skill selection, while moderately sized SLMs (approximately 12B - 30B) benefit substantially from the Agent Skill approach.
arXiv Detail & Related papers (2026-02-18T17:52:17Z)
SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z)
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward [5.124116559484265]
The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice.<n>Rather than encoding all procedural knowledge within model weights, agent skills enable dynamic capability extension without retraining.<n>This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months.
arXiv Detail & Related papers (2026-02-12T21:33:25Z)
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning [83.98129545309277]
We propose SkillRL, a framework that bridges the gap between raw experience and policy improvement.<n>Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank.<n> Experimental results on ALF, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-09T03:17:17Z)
Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality [9.192260493061754]
Agent skills extend large language model (LLM) agents with reusable, program-like modules.<n>We conduct a large-scale, data-driven analysis of 40,285 publicly listed skills from a major marketplace.<n>Our results show that skill publication tends to occur in short bursts that track shifts in community attention.
arXiv Detail & Related papers (2026-02-08T15:14:12Z)
CUA-Skill: Develop Skills for Computer Using Agent [48.87870942314034]
We introduce CUA-Skill, a computer-using agentic skill base that encodes human computer-use knowledge as skills.<n>We construct CUA-Skill Agent, an end-to-end computer-using agent that supports dynamic skill retrieval, argument instantiation, and memory-aware failure recovery.<n>Our results demonstrate that CUA-Skill substantially improves execution success rates and robustness on challenging end-to-end agent benchmarks.
arXiv Detail & Related papers (2026-01-28T23:38:25Z)
PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction [20.687269802717893]
We introduce PolySkill, a new framework that enables agents to learn generalizable and compositional skills.<n> Experiments show that our method improves skill reuse by 1.7x on seen websites.<n>By enabling the agent to identify and refine its own goals, the PolySkill enhances the agent's ability to learn a better curriculum.
arXiv Detail & Related papers (2025-10-17T17:56:00Z)
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios [63.67884284105684]
We introduce textbfUltraHorizon, a novel benchmark that measures the foundational capabilities essential for complex real-world challenges.<n>Agents are designed in long-horizon discovery tasks where they must iteratively uncover hidden rules.<n>Our experiments reveal that LLM-agents consistently underperform in these settings, whereas human participants achieve higher scores.
arXiv Detail & Related papers (2025-09-26T02:04:00Z)
Agentic Knowledgeable Self-awareness [79.25908923383776]
KnowSelf is a data-centric approach that applies agents with knowledgeable self-awareness like humans.<n>Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge.
arXiv Detail & Related papers (2025-04-04T16:03:38Z)
Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model. Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.