Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study
- URL: http://arxiv.org/abs/2602.06547v1
- Date: Fri, 06 Feb 2026 09:52:27 GMT
- Title: Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study
- Authors: Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Leo Yu Zhang,
- Abstract summary: Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines.<n>No ground-truth dataset exists to characterize the resulting threats.<n>We construct the first labeled dataset of malicious agent skills by behaviorally verifying 98,380 skills.
- Score: 47.60135753021306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Third-party agent skills extend LLM-based agents with instruction files and executable code that run on users' machines. Skills execute with user privileges and are distributed through community registries with minimal vetting, but no ground-truth dataset exists to characterize the resulting threats. We construct the first labeled dataset of malicious agent skills by behaviorally verifying 98,380 skills from two community registries, confirming 157 malicious skills with 632 vulnerabilities. These attacks are not incidental. Malicious skills average 4.03 vulnerabilities across a median of three kill chain phases, and the ecosystem has split into two archetypes: Data Thieves that exfiltrate credentials through supply chain techniques, and Agent Hijackers that subvert agent decision-making through instruction manipulation. A single actor accounts for 54.1\% of confirmed cases through templated brand impersonation. Shadow features, capabilities absent from public documentation, appear in 0\% of basic attacks but 100\% of advanced ones; several skills go further by exploiting the AI platform's own hook system and permission flags. Responsible disclosure led to 93.6\% removal within 30 days. We release the dataset and analysis pipeline to support future work on agent skill security.
Related papers
- SoK: Agentic Skills -- Beyond Tool Use in LLM Agents [6.356997609995175]
Agentic systems increasingly rely on reusable procedural capabilities, textita.k.a., agentic skills, to execute long-horizon reliably.<n>This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update)<n>We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution.
arXiv Detail & Related papers (2026-02-24T13:11:38Z) - Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks [27.120130204872325]
We introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files.<n>SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions.<n>Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models.
arXiv Detail & Related papers (2026-02-23T18:59:27Z) - SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z) - Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale [26.757365536859453]
The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend agent capabilities.<n>While this architecture enables powerful customization, skills execute with implicit trust and minimal vetting, creating a significant yet uncharacterized attack surface.<n>We conduct the first large-scale empirical security analysis of this emerging ecosystem, collecting 42,447 skills from two major marketplaces.
arXiv Detail & Related papers (2026-01-15T12:31:52Z) - Chasing One-day Vulnerabilities Across Open Source Forks [3.777973175977788]
This paper presents a novel approach to help developers identify one-day vulnerabilities in forked repositories.<n>The approach propagates vulnerability information at the commit level and performs automated impact analysis.<n>It enables automatic detection of forked projects that have not incorporated fixes, leaving them potentially vulnerable.
arXiv Detail & Related papers (2025-11-07T09:25:47Z) - Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z) - Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks [11.371490212283383]
Code-capable large language model (LLM) agents are embedded into software engineering where they can read, write, and execute code.<n>We present JAWS-BENCH, a benchmark spanning three escalating workspaces that mirror attacker capability.<n>We find that under prompt-only conditions in JAWS-0, code agents accept 61% of attacks on average; 58% are harmful, 52% parse, and 27% run end-to-end.
arXiv Detail & Related papers (2025-10-01T18:38:20Z) - BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems [62.17474934536671]
We introduce the first framework to capture offensive and defensive cyber-capabilities in evolving real-world systems.<n>To capture the vulnerability lifecycle, we define three task types: Detect (detecting a new vulnerability), Exploit (exploiting a specific vulnerability), and Patch (patching a specific vulnerability)<n>We evaluate 8 agents: Claude Code, OpenAI Codex CLI with o3-high and o4-mini, and custom agents with o3-high, GPT-4.1, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet Thinking, and DeepSeek-R1.
arXiv Detail & Related papers (2025-05-21T07:44:52Z) - Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems [50.29939179830491]
Failure attribution in LLM multi-agent systems remains underexplored and labor-intensive.<n>We develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons.<n>The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps.
arXiv Detail & Related papers (2025-04-30T23:09:44Z) - Defeating Prompt Injections by Design [79.00910871948787]
CaMeL is a robust defense that creates a protective system layer around the Large Language Models.<n>To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query.<n>To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows.
arXiv Detail & Related papers (2025-03-24T15:54:10Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.