A New Era in LLM Security: Exploring Security Concerns in Real-World
LLM-based Systems
- URL: http://arxiv.org/abs/2402.18649v1
- Date: Wed, 28 Feb 2024 19:00:12 GMT
- Title: A New Era in LLM Security: Exploring Security Concerns in Real-World
LLM-based Systems
- Authors: Fangzhou Wu, Ning Zhang, Somesh Jha, Patrick McDaniel, Chaowei Xiao
- Abstract summary: We analyze the security of Large Language Model (LLM) systems, instead of focusing on the individual LLMs.
We propose a multi-layer and multi-step approach and apply it to the state-of-art OpenAI GPT4.
We found that although the OpenAI GPT4 has designed numerous safety constraints to improve its safety features, these safety constraints are still vulnerable to attackers.
- Score: 47.18371401090435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model (LLM) systems are inherently compositional, with
individual LLM serving as the core foundation with additional layers of objects
such as plugins, sandbox, and so on. Along with the great potential, there are
also increasing concerns over the security of such probabilistic intelligent
systems. However, existing studies on LLM security often focus on individual
LLM, but without examining the ecosystem through the lens of LLM systems with
other objects (e.g., Frontend, Webtool, Sandbox, and so on). In this paper, we
systematically analyze the security of LLM systems, instead of focusing on the
individual LLMs. To do so, we build on top of the information flow and
formulate the security of LLM systems as constraints on the alignment of the
information flow within LLM and between LLM and other objects. Based on this
construction and the unique probabilistic nature of LLM, the attack surface of
the LLM system can be decomposed into three key components: (1) multi-layer
security analysis, (2) analysis of the existence of constraints, and (3)
analysis of the robustness of these constraints. To ground this new attack
surface, we propose a multi-layer and multi-step approach and apply it to the
state-of-art LLM system, OpenAI GPT4. Our investigation exposes several
security issues, not just within the LLM model itself but also in its
integration with other components. We found that although the OpenAI GPT4 has
designed numerous safety constraints to improve its safety features, these
safety constraints are still vulnerable to attackers. To further demonstrate
the real-world threats of our discovered vulnerabilities, we construct an
end-to-end attack where an adversary can illicitly acquire the user's chat
history, all without the need to manipulate the user's input or gain direct
access to OpenAI GPT4. Our demo is in the link:
https://fzwark.github.io/LLM-System-Attack-Demo/
Related papers
- garak: A Framework for Security Probing Large Language Models [16.305837349514505]
garak is a framework which can be used to discover and identify vulnerabilities in a target Large Language Models (LLMs)
The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts.
arXiv Detail & Related papers (2024-06-16T18:18:43Z) - Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions [125.21418304558948]
leakage in large language models (LLMs) poses a significant security and privacy threat.
leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner.
This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs.
arXiv Detail & Related papers (2024-04-24T23:39:58Z) - Attacks on Third-Party APIs of Large Language Models [15.823694509708302]
Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services.
This innovation enhances the capabilities of LLMs, but it also introduces risks.
This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services.
arXiv Detail & Related papers (2024-04-24T19:27:02Z) - Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security [5.077261736366414]
The pursuit of reliable AI systems like powerful MLLMs has emerged as a pivotal area of contemporary research.
In this paper, we endeavor to demostrate the multifaceted risks associated with the incorporation of image modalities into MLLMs.
arXiv Detail & Related papers (2024-04-08T07:54:18Z) - Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation [98.02846901473697]
We propose ECSO (Eyes Closed, Safety On), a training-free protecting approach that exploits the inherent safety awareness of MLLMs.
ECSO generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs.
arXiv Detail & Related papers (2024-03-14T17:03:04Z) - Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
Model Systems [29.828997665535336]
Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks.
However, the safety and security issues of LLM systems have become the major obstacle to their widespread application.
This paper proposes a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system.
arXiv Detail & Related papers (2024-01-11T09:29:56Z) - Attack Prompt Generation for Red Teaming and Defending Large Language
Models [70.157691818224]
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content.
We propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts.
arXiv Detail & Related papers (2023-10-19T06:15:05Z) - Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models.
We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z) - Trojaning Language Models for Fun and Profit [53.45727748224679]
TROJAN-LM is a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction.
By empirically studying three state-of-the-art LMs in a range of security-critical NLP tasks, we demonstrate that TROJAN-LM possesses the following properties.
arXiv Detail & Related papers (2020-08-01T18:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.