Knowledge Return Oriented Prompting (KROP)
- URL: http://arxiv.org/abs/2406.11880v1
- Date: Tue, 11 Jun 2024 23:58:37 GMT
- Title: Knowledge Return Oriented Prompting (KROP)
- Authors: Jason Martin, Kenneth Yeung,
- Abstract summary: KROP is a prompt injection technique capable of obfuscating prompt injection attacks.
This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures.
Related papers
- Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection [12.565784666173277]
This report presents a real-world case study demonstrating how prompt injection can attack large language model platforms such as ChatGPT.
We show how adversarial prompts can be injected via user inputs, web-based retrieval, and system-level agent instructions.
arXiv Detail & Related papers (2025-04-20T05:59:00Z) - Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions [2.1756081703276]
Security threats like prompt injection attacks pose significant risks to applications that integrate Large Language Models.
This paper introduces a novel method that appends an Encrypted Prompt to each user prompt, embedding current permissions.
arXiv Detail & Related papers (2025-03-29T23:26:57Z) - Embedding-based classifiers can detect prompt injection attacks [5.820776057182452]
Large Language Models (LLMs) are vulnerable to adversarial attacks, particularly prompt injection attacks.
We propose a novel approach based on embedding-based Machine Learning (ML) classifiers to protect LLM-based applications against this severe threat.
arXiv Detail & Related papers (2024-10-29T17:36:59Z) - FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks [45.65210717380502]
Large language models (LLMs) have been widely deployed as the backbone with additional tools and text information for real-world applications.
prompt injection attacks are particularly threatening, where malicious instructions injected in the external text information can exploit LLMs to generate answers as the attackers desire.
This paper introduces a novel test-time defense strategy, named AuThentication with Hash-based tags (FATH)
arXiv Detail & Related papers (2024-10-28T20:02:47Z) - SecAlign: Defending Against Prompt Injection with Preference Optimization [52.48001255555192]
Adrial prompts can be injected into external data sources to override the system's intended instruction and execute a malicious instruction.
We propose a new defense called SecAlign based on the technique of preference optimization.
Our method reduces the success rates of various prompt injections to around 0%, even against attacks much more sophisticated than ones seen during training.
arXiv Detail & Related papers (2024-10-07T19:34:35Z) - Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text.
Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z) - On Prompt-Driven Safeguarding for Large Language Models [172.13943777203377]
We find that in the representation space, the input queries are typically moved by safety prompts in a "higher-refusal" direction.
Inspired by these findings, we propose a method for safety prompt optimization, namely DRO.
Treating a safety prompt as continuous, trainable embeddings, DRO learns to move the queries' representations along or opposite the refusal direction, depending on their harmfulness.
arXiv Detail & Related papers (2024-01-31T17:28:24Z) - Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks
Against LLM-Integrated Applications [0.0]
This paper introduces the 'Signed-Prompt' method as a novel solution for prompt injection attacks.
The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources.
Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks.
arXiv Detail & Related papers (2024-01-15T11:44:18Z) - PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models [11.693095252994482]
We present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs.
Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.
arXiv Detail & Related papers (2023-10-19T03:25:28Z) - Evaluating the Instruction-Following Robustness of Large Language Models
to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following.
This capability brings with it the risk of prompt injection attacks.
We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.