Related papers: Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems

Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems

URL: http://arxiv.org/abs/2506.19109v1
Date: Mon, 23 Jun 2025 20:39:43 GMT
Title: Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
Authors: Valerii Gakh, Hayretdin Bahsi,
Abstract summary: In prompt injection attacks, an attacker maliciously manipulates the system instructions, violating the system's confidentiality.<n>We investigated the capabilities of early prompt injection detection systems, focusing specifically on the detection performance of techniques implemented in various open-source solutions.<n>Our study presents analyzes of distinct prompt leakage detection techniques, and a comparative analysis of several detection solutions, which implement those techniques.
Score: 1.03590082373586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prompt injection threatens novel applications that emerge from adapting LLMs for various user tasks. The newly developed LLM-based software applications become more ubiquitous and diverse. However, the threat of prompt injection attacks undermines the security of these systems as the mitigation and defenses against them, proposed so far, are insufficient. We investigated the capabilities of early prompt injection detection systems, focusing specifically on the detection performance of techniques implemented in various open-source solutions. These solutions are supposed to detect certain types of prompt injection attacks, including the prompt leak. In prompt leakage attacks, an attacker maliciously manipulates the LLM into outputting its system instructions, violating the system's confidentiality. Our study presents analyzes of distinct prompt leakage detection techniques, and a comparative analysis of several detection solutions, which implement those techniques. We identify the strengths and weaknesses of these techniques and elaborate on their optimal configuration and usage in high-stake deployments. In one of the first studies on existing prompt leak detection solutions, we compared the performances of LLM Guard, Vigil, and Rebuff. We concluded that the implementations of canary word checks in Vigil and Rebuff were not effective at detecting prompt leak attacks, and we proposed improvements for them. We also found an evasion weakness in Rebuff's secondary model-based technique and proposed a mitigation. Then, the result of the comparison of LLM Guard, Vigil, and Rebuff at their peak performance revealed that Vigil is optimal for cases when minimal false positive rate is required, and Rebuff is the most optimal for average needs.

Related papers

Proactive Hardening of LLM Defenses with HASTE [0.614338876867286]
Prompt-based attack techniques are one of the primary challenges in securely deploying and protecting LLM-based AI systems.<n>We present HASTE (Hard-negative Attack Sample Training Engine): a systematic framework that iteratively engineers highly evasive prompts.<n>The framework can be generalized to evaluate prompt-injection detection efficacy, with and without fuzzing, for any hard-negative or hard-positive iteration strategy.
arXiv Detail & Related papers (2026-01-27T00:19:34Z)
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation [73.76465227729818]
Open-source Vision-Language Models (VLMs) have achieved state-of-the-art performance on benchmark tasks.<n>Pretraining corpora raise a critical concern for both practitioners and users: inflated performance due to test-set leakage.<n>We show that existing detection approaches either fail outright or exhibit inconsistent behavior.<n>We propose a novel simple yet effective detection method based on multi-modal semantic perturbation.
arXiv Detail & Related papers (2025-11-05T18:59:52Z)
Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents [4.303444472156151]
Large Language Model (LLM)-enabled agents are rapidly emerging across a wide range of applications.<n>This work presents the first study of time-of-check to time-of-use (TOCTOU) vulnerabilities in LLM-enabled agents.<n>We introduce TOCTOU-Bench, a benchmark with 66 realistic user tasks designed to evaluate this class of vulnerabilities.
arXiv Detail & Related papers (2025-08-23T22:41:49Z)
System Prompt Extraction Attacks and Defenses in Large Language Models [2.6986500640871482]
System prompt in Large Language Models (LLMs) plays a pivotal role in guiding model behavior and response generation.<n>Recent studies have shown that LLM system prompts are highly susceptible to extraction attacks through meticulously designed queries.<n>Despite the growing threat, there is a lack of systematic studies of system prompt extraction attacks and defenses.
arXiv Detail & Related papers (2025-05-27T21:36:27Z)
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks [101.52204404377039]
LLM-integrated applications and agents are vulnerable to prompt injection attacks.<n>A detection method aims to determine whether a given input is contaminated by an injected prompt.<n>We propose DataSentinel, a game-theoretic method to detect prompt injection attacks.
arXiv Detail & Related papers (2025-04-15T16:26:21Z)
MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z)
Joint Optimization of Prompt Security and System Performance in Edge-Cloud LLM Systems [15.058369477125893]
Large language models (LLMs) have significantly facilitated human life, and prompt engineering has improved the efficiency of these models.<n>Recent years have witnessed a rise in prompt engineering-empowered attacks, leading to issues such as privacy leaks, increased latency, and system resource wastage.<n>We jointly consider prompt security, service latency, and system resource optimization in Edge-Cloud LLM (EC-LLM) systems under various prompt attacks.
arXiv Detail & Related papers (2025-01-30T14:33:49Z)
PrivAgent: Agentic-based Red-teaming for LLM Privacy Leakage [78.33839735526769]
LLMs may be fooled into outputting private information under carefully crafted adversarial prompts.<n>PrivAgent is a novel black-box red-teaming framework for privacy leakage.
arXiv Detail & Related papers (2024-12-07T20:09:01Z)
Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks.<n>We introduce the concept of the distraction effect, where specific attention heads shift focus from the original instruction to the injected instruction.<n>We propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks.
arXiv Detail & Related papers (2024-11-01T04:05:59Z)
Palisade -- Prompt Injection Detection Framework [0.9620910657090188]
Large Language Models are vulnerable to malicious prompt injection attacks. This paper proposes a novel NLP based approach for prompt injection detection. It emphasizes accuracy and optimization through a layered input screening process.
arXiv Detail & Related papers (2024-10-28T15:47:03Z)
Optimization-based Prompt Injection Attack to LLM-as-a-Judge [78.20257854455562]
LLM-as-a-Judge uses a large language model (LLM) to select the best response from a set of candidates for a given question.<n>We propose JudgeDeceiver, an optimization-based prompt injection attack to LLM-as-a-Judge.<n>Our evaluation shows that JudgeDeceive is highly effective, and is much more effective than existing prompt injection attacks.
arXiv Detail & Related papers (2024-03-26T13:58:00Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.