Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection
- URL: http://arxiv.org/abs/2302.12173v2
- Date: Fri, 5 May 2023 14:26:17 GMT
- Title: Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection
- Authors: Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres,
Thorsten Holz, Mario Fritz
- Abstract summary: Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
- Score: 64.67495502772866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly being integrated into various
applications. The functionalities of recent LLMs can be flexibly modulated via
natural language prompts. This renders them susceptible to targeted adversarial
prompting, e.g., Prompt Injection (PI) attacks enable attackers to override
original instructions and employed controls. So far, it was assumed that the
user is directly prompting the LLM. But, what if it is not the user prompting?
We argue that LLM-Integrated Applications blur the line between data and
instructions. We reveal new attack vectors, using Indirect Prompt Injection,
that enable adversaries to remotely (without a direct interface) exploit
LLM-integrated applications by strategically injecting prompts into data likely
to be retrieved. We derive a comprehensive taxonomy from a computer security
perspective to systematically investigate impacts and vulnerabilities,
including data theft, worming, information ecosystem contamination, and other
novel security risks. We demonstrate our attacks' practical viability against
both real-world systems, such as Bing's GPT-4 powered Chat and code-completion
engines, and synthetic applications built on GPT-4. We show how processing
retrieved prompts can act as arbitrary code execution, manipulate the
application's functionality, and control how and if other APIs are called.
Despite the increasing integration and reliance on LLMs, effective mitigations
of these emerging threats are currently lacking. By raising awareness of these
vulnerabilities and providing key insights into their implications, we aim to
promote the safe and responsible deployment of these powerful models and the
development of robust defenses that protect users and systems from potential
attacks.
Related papers
- Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach [9.483655213280738]
This paper presents a novel approach to evaluating the security of large language models (LLMs)
We define prompt leakage as a critical threat to secure LLM deployment.
We implement a multi-agent system where cooperative agents are tasked with probing and exploiting the target LLM to elicit its prompt.
arXiv Detail & Related papers (2025-02-18T08:17:32Z) - Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)
In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.
We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z) - Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems [6.480532634073257]
We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents.
This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption.
To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread.
arXiv Detail & Related papers (2024-10-09T11:01:29Z) - SecAlign: Defending Against Prompt Injection with Preference Optimization [52.48001255555192]
Adrial prompts can be injected into external data sources to override the system's intended instruction and execute a malicious instruction.
We propose a new defense called SecAlign based on the technique of preference optimization.
Our method reduces the success rates of various prompt injections to around 0%, even against attacks much more sophisticated than ones seen during training.
arXiv Detail & Related papers (2024-10-07T19:34:35Z) - Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - Defending Against Indirect Prompt Injection Attacks With Spotlighting [11.127479817618692]
In common applications, multiple inputs can be processed by concatenating them together into a single stream of text.
Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands.
We introduce spotlighting, a family of prompt engineering techniques that can be used to improve LLMs' ability to distinguish among multiple sources of input.
arXiv Detail & Related papers (2024-03-20T15:26:23Z) - Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities.
Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content.
We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z) - Identifying and Mitigating Vulnerabilities in LLM-Integrated
Applications [37.316238236750415]
Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications.
In this work, we consider a setup where the user and LLM interact via an LLM-integrated application in the middle.
We identify potential vulnerabilities that can originate from the malicious application developer or from an outsider threat.
We develop a lightweight, threat-agnostic defense that mitigates both insider and outsider threats.
arXiv Detail & Related papers (2023-11-07T20:13:05Z) - Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of
LLMs through a Global Scale Prompt Hacking Competition [8.560772603154545]
Large Language Models are vulnerable to prompt injection and jailbreaking.
We launch a global prompt hacking competition, which allows for free-form human input attacks.
We elicit 600K+ adversarial prompts against three state-of-the-art LLMs.
arXiv Detail & Related papers (2023-10-24T18:18:11Z) - Evaluating the Instruction-Following Robustness of Large Language Models
to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following.
This capability brings with it the risk of prompt injection attacks.
We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.