Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
- URL: http://arxiv.org/abs/2506.19109v1
- Date: Mon, 23 Jun 2025 20:39:43 GMT
- Title: Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
- Authors: Valerii Gakh, Hayretdin Bahsi,
- Abstract summary: In prompt injection attacks, an attacker maliciously manipulates the system instructions, violating the system's confidentiality.<n>We investigated the capabilities of early prompt injection detection systems, focusing specifically on the detection performance of techniques implemented in various open-source solutions.<n>Our study presents analyzes of distinct prompt leakage detection techniques, and a comparative analysis of several detection solutions, which implement those techniques.
- Score: 1.03590082373586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt injection threatens novel applications that emerge from adapting LLMs for various user tasks. The newly developed LLM-based software applications become more ubiquitous and diverse. However, the threat of prompt injection attacks undermines the security of these systems as the mitigation and defenses against them, proposed so far, are insufficient. We investigated the capabilities of early prompt injection detection systems, focusing specifically on the detection performance of techniques implemented in various open-source solutions. These solutions are supposed to detect certain types of prompt injection attacks, including the prompt leak. In prompt leakage attacks, an attacker maliciously manipulates the LLM into outputting its system instructions, violating the system's confidentiality. Our study presents analyzes of distinct prompt leakage detection techniques, and a comparative analysis of several detection solutions, which implement those techniques. We identify the strengths and weaknesses of these techniques and elaborate on their optimal configuration and usage in high-stake deployments. In one of the first studies on existing prompt leak detection solutions, we compared the performances of LLM Guard, Vigil, and Rebuff. We concluded that the implementations of canary word checks in Vigil and Rebuff were not effective at detecting prompt leak attacks, and we proposed improvements for them. We also found an evasion weakness in Rebuff's secondary model-based technique and proposed a mitigation. Then, the result of the comparison of LLM Guard, Vigil, and Rebuff at their peak performance revealed that Vigil is optimal for cases when minimal false positive rate is required, and Rebuff is the most optimal for average needs.
Related papers
- System Prompt Extraction Attacks and Defenses in Large Language Models [2.6986500640871482]
System prompt in Large Language Models (LLMs) plays a pivotal role in guiding model behavior and response generation.<n>Recent studies have shown that LLM system prompts are highly susceptible to extraction attacks through meticulously designed queries.<n>Despite the growing threat, there is a lack of systematic studies of system prompt extraction attacks and defenses.
arXiv Detail & Related papers (2025-05-27T21:36:27Z) - DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks [101.52204404377039]
LLM-integrated applications and agents are vulnerable to prompt injection attacks.<n>A detection method aims to determine whether a given input is contaminated by an injected prompt.<n>We propose DataSentinel, a game-theoretic method to detect prompt injection attacks.
arXiv Detail & Related papers (2025-04-15T16:26:21Z) - MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - Joint Optimization of Prompt Security and System Performance in Edge-Cloud LLM Systems [15.058369477125893]
Large language models (LLMs) have significantly facilitated human life, and prompt engineering has improved the efficiency of these models.<n>Recent years have witnessed a rise in prompt engineering-empowered attacks, leading to issues such as privacy leaks, increased latency, and system resource wastage.<n>We jointly consider prompt security, service latency, and system resource optimization in Edge-Cloud LLM (EC-LLM) systems under various prompt attacks.
arXiv Detail & Related papers (2025-01-30T14:33:49Z) - PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
LLMs may be fooled into outputting private information under carefully crafted adversarial prompts.<n>PrivAgent is a novel black-box red-teaming framework for privacy leakage.
arXiv Detail & Related papers (2024-12-07T20:09:01Z) - Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks.<n>We introduce the concept of the distraction effect, where specific attention heads shift focus from the original instruction to the injected instruction.<n>We propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks.
arXiv Detail & Related papers (2024-11-01T04:05:59Z) - Palisade -- Prompt Injection Detection Framework [0.9620910657090188]
Large Language Models are vulnerable to malicious prompt injection attacks.
This paper proposes a novel NLP based approach for prompt injection detection.
It emphasizes accuracy and optimization through a layered input screening process.
arXiv Detail & Related papers (2024-10-28T15:47:03Z) - Optimization-based Prompt Injection Attack to LLM-as-a-Judge [78.20257854455562]
LLM-as-a-Judge uses a large language model (LLM) to select the best response from a set of candidates for a given question.<n>We propose JudgeDeceiver, an optimization-based prompt injection attack to LLM-as-a-Judge.<n>Our evaluation shows that JudgeDeceive is highly effective, and is much more effective than existing prompt injection attacks.
arXiv Detail & Related papers (2024-03-26T13:58:00Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.