Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks
Against LLM-Integrated Applications
- URL: http://arxiv.org/abs/2401.07612v1
- Date: Mon, 15 Jan 2024 11:44:18 GMT
- Title: Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks
Against LLM-Integrated Applications
- Authors: Xuchen Suo
- Abstract summary: This paper introduces the 'Signed-Prompt' method as a novel solution for prompt injection attacks.
The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources.
Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The critical challenge of prompt injection attacks in Large Language Models
(LLMs) integrated applications, a growing concern in the Artificial
Intelligence (AI) field. Such attacks, which manipulate LLMs through natural
language inputs, pose a significant threat to the security of these
applications. Traditional defense strategies, including output and input
filtering, as well as delimiter use, have proven inadequate. This paper
introduces the 'Signed-Prompt' method as a novel solution. The study involves
signing sensitive instructions within command segments by authorized users,
enabling the LLM to discern trusted instruction sources. The paper presents a
comprehensive analysis of prompt injection attack patterns, followed by a
detailed explanation of the Signed-Prompt concept, including its basic
architecture and implementation through both prompt engineering and fine-tuning
of LLMs. Experiments demonstrate the effectiveness of the Signed-Prompt method,
showing substantial resistance to various types of prompt injection attacks,
thus validating its potential as a robust defense strategy in AI security.
Related papers
- Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
We explore converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning [8.273997600635271]
This abstract explores a novel approach to protecting large language models from prompt injection and jailbreaking attacks, termed "soft begging"
We provide an overview of prompt injections and jailbreaking, introduce the theoretical basis of the "soft begging" technique, and discuss an evaluation of its effectiveness.
arXiv Detail & Related papers (2024-07-03T14:52:09Z) - Knowledge Return Oriented Prompting (KROP) [0.0]
KROP is a prompt injection technique capable of obfuscating prompt injection attacks.
This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks.
arXiv Detail & Related papers (2024-06-11T23:58:37Z) - Defending Against Indirect Prompt Injection Attacks With Spotlighting [11.127479817618692]
In common applications, multiple inputs can be processed by concatenating them together into a single stream of text.
Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands.
We introduce spotlighting, a family of prompt engineering techniques that can be used to improve LLMs' ability to distinguish among multiple sources of input.
arXiv Detail & Related papers (2024-03-20T15:26:23Z) - AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting [54.931241667414184]
We propose textbfAdaptive textbfShield Prompting, which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks.
Our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks.
arXiv Detail & Related papers (2024-03-14T15:57:13Z) - ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text.
Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z) - Benchmarking and Defending Against Indirect Prompt Injection Attacks on
Large Language Models [82.98081731588717]
Integration of large language models with external content exposes applications to indirect prompt injection attacks.
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to evaluate the risk of such attacks.
We develop two black-box methods based on prompt learning and a white-box defense method based on fine-tuning with adversarial training.
arXiv Detail & Related papers (2023-12-21T01:08:39Z) - Evaluating the Instruction-Following Robustness of Large Language Models
to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following.
This capability brings with it the risk of prompt injection attacks.
We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.