Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems
- URL: http://arxiv.org/abs/2601.17548v1
- Date: Sat, 24 Jan 2026 18:39:21 GMT
- Title: Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems
- Authors: Narek Maloyan, Dmitry Namiot,
- Abstract summary: We present a comprehensive analysis of prompt injection attacks targeting agentic coding assistants.<n>Our meta-analysis synthesizes findings from 78 recent studies.<n>Our findings indicate that the security community must treat prompt injection as a first-class vulnerability class.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of agentic AI coding assistants, including Claude Code, GitHub Copilot, Cursor, and emerging skill-based architectures, has fundamentally transformed software development workflows. These systems leverage Large Language Models (LLMs) integrated with external tools, file systems, and shell access through protocols like the Model Context Protocol (MCP). However, this expanded capability surface introduces critical security vulnerabilities. In this \textbf{Systematization of Knowledge (SoK)} paper, we present a comprehensive analysis of prompt injection attacks targeting agentic coding assistants. We propose a novel three-dimensional taxonomy categorizing attacks across \textit{delivery vectors}, \textit{attack modalities}, and \textit{propagation behaviors}. Our meta-analysis synthesizes findings from 78 recent studies (2021--2026), consolidating evidence that attack success rates against state-of-the-art defenses exceed 85\% when adaptive attack strategies are employed. We systematically catalog 42 distinct attack techniques spanning input manipulation, tool poisoning, protocol exploitation, multimodal injection, and cross-origin context poisoning. Through critical analysis of 18 defense mechanisms reported in prior work, we identify that most achieve less than 50\% mitigation against sophisticated adaptive attacks. We contribute: (1) a unified taxonomy bridging disparate attack classifications, (2) the first systematic analysis of skill-based architecture vulnerabilities with concrete exploit chains, and (3) a defense-in-depth framework grounded in the limitations we identify. Our findings indicate that the security community must treat prompt injection as a first-class vulnerability class requiring architectural-level mitigations rather than ad-hoc filtering approaches.
Related papers
- AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks on Agentic LLMs [24.71883582216731]
AdapTools is a novel adaptive IPI attack framework that selects stealthier attack tools and generates adaptive attack prompts.<n>AdapTools achieves a 2.13 times improvement in attack success rate while degrading system utility by a factor of 1.78.
arXiv Detail & Related papers (2026-02-24T09:32:19Z) - Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [65.6660735371212]
We present textbftextscJustAsk, a framework that autonomously discovers effective extraction strategies through interaction alone.<n>It formulates extraction as an online exploration problem, using Upper Confidence Bound--based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration.<n>Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
arXiv Detail & Related papers (2026-01-29T03:53:25Z) - Constructing Multi-label Hierarchical Classification Models for MITRE ATT&CK Text Tagging [0.0]
We provide a "task space" characterization of the MITRE ATT&CK text tagging task.<n>We construct our own multi-label hierarchical classification models for the text tagging task.<n>Our models meet or surpass state-of-the-art performance while relying only on classical machine learning methods.
arXiv Detail & Related papers (2026-01-21T00:41:34Z) - Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development [11.76638109321532]
We propose an architecture of a multi-agent system for the implementation phase of the software engineering process.<n>We demonstrate that while such systems can generate code quite accurately, they are vulnerable to attacks, including code injection.
arXiv Detail & Related papers (2025-12-26T01:08:43Z) - The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z) - Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z) - Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills [3.0620527758972496]
This paper identifies and analyzes a novel vulnerability class in Model Context Protocol based agent systems.<n>The attack chain describes and demonstrates how benign, individually authorized tasks can be orchestrated to produce harmful emergent behaviors.
arXiv Detail & Related papers (2025-08-27T01:11:59Z) - Systematic Analysis of MCP Security [13.801464032236481]
The Model Context Protocol (MCP) has emerged as a universal standard that enables AI agents to seamlessly connect with external tools.<n> MCP introduces significant vulnerabilities, such as Tool Poisoning Attacks (TPA), where hidden malicious instructions exploit the sycophancy of large language models.<n>We present the MCP Attack Library (MCPLIB), which categorizes and implements 31 distinct attack methods under four key classifications.
arXiv Detail & Related papers (2025-08-18T00:23:41Z) - Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms [1.03121181235382]
Large Language Model (LLM) agents face security vulnerabilities spanning AI-specific and traditional software domains.<n>This study bridges this gap through comparative evaluation of Function Calling architecture and Model Context Protocol (MCP) deployment paradigms.<n>We tested 3,250 attack scenarios across seven language models, evaluating simple, composed, and chained attacks targeting both AI-specific threats and software vulnerabilities.
arXiv Detail & Related papers (2025-07-08T18:24:28Z) - A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z) - AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models [0.0]
Large Language Models (LLMs) continue to exhibit vulnerabilities to jailbreaking attacks.<n>We present AutoAdv, a novel framework that automates adversarial prompt generation.<n>We show that our attacks achieve jailbreak success rates of up to 86% for harmful content generation.
arXiv Detail & Related papers (2025-04-18T08:38:56Z) - Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models.<n>We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats.<n>Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z) - AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens [83.08119913279488]
We present a systematic analysis of the dependency relationships in jailbreak attack and defense techniques.
We propose three comprehensive, automated, and logical frameworks.
We show that the proposed ensemble jailbreak attack and defense framework significantly outperforms existing research.
arXiv Detail & Related papers (2024-06-06T07:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.