Prompt Injection attack against LLM-integrated Applications
- URL: http://arxiv.org/abs/2306.05499v2
- Date: Sat, 2 Mar 2024 09:12:23 GMT
- Title: Prompt Injection attack against LLM-integrated Applications
- Authors: Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng
Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng and Yang Liu
- Abstract summary: This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications.
We formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks.
We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection.
- Score: 37.86878788874201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs), renowned for their superior proficiency in
language comprehension and generation, stimulate a vibrant ecosystem of
applications around them. However, their extensive assimilation into various
services introduces significant security risks. This study deconstructs the
complexities and implications of prompt injection attacks on actual
LLM-integrated applications. Initially, we conduct an exploratory analysis on
ten commercial applications, highlighting the constraints of current attack
strategies in practice. Prompted by these limitations, we subsequently
formulate HouYi, a novel black-box prompt injection attack technique, which
draws inspiration from traditional web injection attacks. HouYi is
compartmentalized into three crucial elements: a seamlessly-incorporated
pre-constructed prompt, an injection prompt inducing context partition, and a
malicious payload designed to fulfill the attack objectives. Leveraging HouYi,
we unveil previously unknown and severe attack outcomes, such as unrestricted
arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi
on 36 actual LLM-integrated applications and discern 31 applications
susceptible to prompt injection. 10 vendors have validated our discoveries,
including Notion, which has the potential to impact millions of users. Our
investigation illuminates both the possible risks of prompt injection attacks
and the possible tactics for mitigation.
Related papers
- Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks.
We introduce the concept of the distraction effect, where specific attention heads shift focus from the original instruction to the injected instruction.
We propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks.
arXiv Detail & Related papers (2024-11-01T04:05:59Z) - Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - Automatic and Universal Prompt Injection Attacks against Large Language
Models [38.694912482525446]
Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions.
These attacks manipulate applications into producing responses aligned with the attacker's injected content, deviating from the user's actual requests.
We introduce a unified framework for understanding the objectives of prompt injection attacks and present an automated gradient-based method for generating highly effective and universal prompt injection data.
arXiv Detail & Related papers (2024-03-07T23:46:20Z) - Benchmarking and Defending Against Indirect Prompt Injection Attacks on
Large Language Models [82.98081731588717]
Integration of large language models with external content exposes applications to indirect prompt injection attacks.
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to evaluate the risk of such attacks.
We develop two black-box methods based on prompt learning and a white-box defense method based on fine-tuning with adversarial training.
arXiv Detail & Related papers (2023-12-21T01:08:39Z) - Formalizing and Benchmarking Prompt Injection Attacks and Defenses [59.57908526441172]
We propose a framework to formalize prompt injection attacks.
Based on our framework, we design a new attack by combining existing ones.
Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses.
arXiv Detail & Related papers (2023-10-19T15:12:09Z) - PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models [11.693095252994482]
We present POISONPROMPT, a novel backdoor attack capable of successfully compromising both hard and soft prompt-based LLMs.
Our findings highlight the potential security threats posed by backdoor attacks on prompt-based LLMs and emphasize the need for further research in this area.
arXiv Detail & Related papers (2023-10-19T03:25:28Z) - Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.