Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection
- URL: http://arxiv.org/abs/2302.12173v2
- Date: Fri, 5 May 2023 14:26:17 GMT
- Title: Not what you've signed up for: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection
- Authors: Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres,
Thorsten Holz, Mario Fritz
- Abstract summary: Large Language Models (LLMs) are increasingly being integrated into various applications.
We show how attackers can override original instructions and employed controls using Prompt Injection attacks.
We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
- Score: 64.67495502772866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) are increasingly being integrated into various
applications. The functionalities of recent LLMs can be flexibly modulated via
natural language prompts. This renders them susceptible to targeted adversarial
prompting, e.g., Prompt Injection (PI) attacks enable attackers to override
original instructions and employed controls. So far, it was assumed that the
user is directly prompting the LLM. But, what if it is not the user prompting?
We argue that LLM-Integrated Applications blur the line between data and
instructions. We reveal new attack vectors, using Indirect Prompt Injection,
that enable adversaries to remotely (without a direct interface) exploit
LLM-integrated applications by strategically injecting prompts into data likely
to be retrieved. We derive a comprehensive taxonomy from a computer security
perspective to systematically investigate impacts and vulnerabilities,
including data theft, worming, information ecosystem contamination, and other
novel security risks. We demonstrate our attacks' practical viability against
both real-world systems, such as Bing's GPT-4 powered Chat and code-completion
engines, and synthetic applications built on GPT-4. We show how processing
retrieved prompts can act as arbitrary code execution, manipulate the
application's functionality, and control how and if other APIs are called.
Despite the increasing integration and reliance on LLMs, effective mitigations
of these emerging threats are currently lacking. By raising awareness of these
vulnerabilities and providing key insights into their implications, we aim to
promote the safe and responsible deployment of these powerful models and the
development of robust defenses that protect users and systems from potential
attacks.
Related papers
- Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems [6.480532634073257]
We introduce Prompt Infection, a novel attack where malicious prompts self-replicate across interconnected agents.
This attack poses severe threats, including data theft, scams, misinformation, and system-wide disruption.
To address this, we propose LLM Tagging, a defense mechanism that, when combined with existing safeguards, significantly mitigates infection spread.
arXiv Detail & Related papers (2024-10-09T11:01:29Z) - Aligning LLMs to Be Robust Against Prompt Injection [55.07562650579068]
We show that alignment can be a powerful tool to make LLMs more robust against prompt injection attacks.
Our method -- SecAlign -- first builds an alignment dataset by simulating prompt injection attacks.
Our experiments show that SecAlign robustifies the LLM substantially with a negligible hurt on model utility.
arXiv Detail & Related papers (2024-10-07T19:34:35Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - Defending Against Indirect Prompt Injection Attacks With Spotlighting [11.127479817618692]
In common applications, multiple inputs can be processed by concatenating them together into a single stream of text.
Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands.
We introduce spotlighting, a family of prompt engineering techniques that can be used to improve LLMs' ability to distinguish among multiple sources of input.
arXiv Detail & Related papers (2024-03-20T15:26:23Z) - AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting [54.931241667414184]
We propose textbfAdaptive textbfShield Prompting, which prepends inputs with defense prompts to defend MLLMs against structure-based jailbreak attacks.
Our methods can consistently improve MLLMs' robustness against structure-based jailbreak attacks.
arXiv Detail & Related papers (2024-03-14T15:57:13Z) - Identifying and Mitigating Vulnerabilities in LLM-Integrated
Applications [37.316238236750415]
Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications.
In this work, we consider a setup where the user and LLM interact via an LLM-integrated application in the middle.
We identify potential vulnerabilities that can originate from the malicious application developer or from an outsider threat.
We develop a lightweight, threat-agnostic defense that mitigates both insider and outsider threats.
arXiv Detail & Related papers (2023-11-07T20:13:05Z) - Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of
LLMs through a Global Scale Prompt Hacking Competition [8.560772603154545]
Large Language Models are vulnerable to prompt injection and jailbreaking.
We launch a global prompt hacking competition, which allows for free-form human input attacks.
We elicit 600K+ adversarial prompts against three state-of-the-art LLMs.
arXiv Detail & Related papers (2023-10-24T18:18:11Z) - Evaluating the Instruction-Following Robustness of Large Language Models
to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following.
This capability brings with it the risk of prompt injection attacks.
We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.