Related papers: To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt

To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt

URL: http://arxiv.org/abs/2506.05739v1
Date: Fri, 06 Jun 2025 04:50:57 GMT
Title: To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt
Authors: Zhilong Wang, Neha Nagaraja, Lan Zhang, Hayretdin Bahsi, Pawan Patil, Peng Liu,
Abstract summary: We propose a novel, lightweight defense mechanism called Polymorphic Prompt Assembling.<n>The approach is based on the insight that prompt injection requires guessing and breaking the structure of the system prompt.<n> PPA prevents attackers from predicting the prompt structure, thereby enhancing security without compromising performance.
Score: 5.8935359767204805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLM agents are widely used as agents for customer support, content generation, and code assistance. However, they are vulnerable to prompt injection attacks, where adversarial inputs manipulate the model's behavior. Traditional defenses like input sanitization, guard models, and guardrails are either cumbersome or ineffective. In this paper, we propose a novel, lightweight defense mechanism called Polymorphic Prompt Assembling (PPA), which protects against prompt injection with near-zero overhead. The approach is based on the insight that prompt injection requires guessing and breaking the structure of the system prompt. By dynamically varying the structure of system prompts, PPA prevents attackers from predicting the prompt structure, thereby enhancing security without compromising performance. We conducted experiments to evaluate the effectiveness of PPA against existing attacks and compared it with other defense methods.

Related papers

Defending Against Prompt Injection With a Few DefensiveTokens [53.7493897456957]
Large language model (LLM) systems interact with external data to perform complex tasks.<n>By injecting instructions into the data accessed by the system, an attacker can override the initial user task with an arbitrary task directed by the attacker.<n>Test-time defenses, e.g., defensive prompting, have been proposed for system developers to attain security only when needed in a flexible manner.<n>We propose DefensiveToken, a test-time defense with prompt injection comparable to training-time alternatives.
arXiv Detail & Related papers (2025-07-10T17:51:05Z)
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks [47.62236306990252]
Large Language Models (LLMs) are susceptible to indirect prompt injection attacks.<n>This vulnerability stems from LLMs' inability to distinguish between data and instructions within a prompt.<n>We propose CachePrune that defends against this attack by identifying and pruning task-triggering neurons.
arXiv Detail & Related papers (2025-04-29T23:42:21Z)
Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection [12.565784666173277]
This report presents a real-world case study demonstrating how prompt injection can attack large language model platforms such as ChatGPT.<n>We show how adversarial prompts can be injected via user inputs, web-based retrieval, and system-level agent instructions.
arXiv Detail & Related papers (2025-04-20T05:59:00Z)
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models [30.139590566956077]
Large Language Models (LLMs) are vulnerable to attacks like prompt injection, backdoor attacks, and adversarial attacks.<n>We propose UniGuardian, the first unified defense mechanism designed to detect prompt injection, backdoor attacks, and adversarial attacks in LLMs.
arXiv Detail & Related papers (2025-02-18T18:59:00Z)
MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z)
FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks [45.65210717380502]
Large language models (LLMs) have been widely deployed as the backbone with additional tools and text information for real-world applications. prompt injection attacks are particularly threatening, where malicious instructions injected in the external text information can exploit LLMs to generate answers as the attackers desire. This paper introduces a novel test-time defense strategy, named AuThentication with Hash-based tags (FATH)
arXiv Detail & Related papers (2024-10-28T20:02:47Z)
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting [54.931241667414184]
We propose textbfAdaptive textbfShield Prompting, which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks. Our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks.
arXiv Detail & Related papers (2024-03-14T15:57:13Z)
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications [0.0]
This paper introduces the 'Signed-Prompt' method as a novel solution for prompt injection attacks. The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources. Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks.
arXiv Detail & Related papers (2024-01-15T11:44:18Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Formalizing and Benchmarking Prompt Injection Attacks and Defenses [59.57908526441172]
We propose a framework to formalize prompt injection attacks. Based on our framework, we design a new attack by combining existing ones. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses.
arXiv Detail & Related papers (2023-10-19T15:12:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.