Survey of Vulnerabilities in Large Language Models Revealed by
Adversarial Attacks
- URL: http://arxiv.org/abs/2310.10844v1
- Date: Mon, 16 Oct 2023 21:37:24 GMT
- Title: Survey of Vulnerabilities in Large Language Models Revealed by
Adversarial Attacks
- Authors: Erfan Shayegani, Md Abdullah Al Mamun, Yu Fu, Pedram Zaree, Yue Dong,
Nael Abu-Ghazaleh
- Abstract summary: Large Language Models (LLMs) are swiftly advancing in architecture and capability.
As they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows.
This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs.
- Score: 5.860289498416911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are swiftly advancing in architecture and
capability, and as they integrate more deeply into complex systems, the urgency
to scrutinize their security properties grows. This paper surveys research in
the emerging interdisciplinary field of adversarial attacks on LLMs, a subfield
of trustworthy ML, combining the perspectives of Natural Language Processing
and Security. Prior work has shown that even safety-aligned LLMs (via
instruction tuning and reinforcement learning through human feedback) can be
susceptible to adversarial attacks, which exploit weaknesses and mislead AI
systems, as evidenced by the prevalence of `jailbreak' attacks on models like
ChatGPT and Bard. In this survey, we first provide an overview of large
language models, describe their safety alignment, and categorize existing
research based on various learning structures: textual-only attacks,
multi-modal attacks, and additional attack methods specifically targeting
complex systems, such as federated learning or multi-agent systems. We also
offer comprehensive remarks on works that focus on the fundamental sources of
vulnerabilities and potential defenses. To make this field more accessible to
newcomers, we present a systematic review of existing works, a structured
typology of adversarial attack concepts, and additional resources, including
slides for presentations on related topics at the 62nd Annual Meeting of the
Association for Computational Linguistics (ACL'24).
Related papers
- Jailbreaking and Mitigation of Vulnerabilities in Large Language Models [4.564507064383306]
Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation.
Despite these advancements, LLMs have shown considerable vulnerabilities, particularly to prompt injection and jailbreaking attacks.
This review analyzes the state of research on these vulnerabilities and presents available defense strategies.
arXiv Detail & Related papers (2024-10-20T00:00:56Z) - Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations [0.0]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, but their vulnerability to jailbreak attacks poses significant security risks.
This survey paper presents a comprehensive analysis of recent advancements in attack strategies and defense mechanisms within the field of Large Language Model (LLM) red-teaming.
arXiv Detail & Related papers (2024-10-09T01:35:38Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - garak: A Framework for Security Probing Large Language Models [16.305837349514505]
garak is a framework which can be used to discover and identify vulnerabilities in a target Large Language Models (LLMs)
The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts.
arXiv Detail & Related papers (2024-06-16T18:18:43Z) - Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey [46.19229410404056]
Large language models (LLMs) have made remarkable advancements in natural language processing.
These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities.
Privacy and security issues have been revealed throughout their life cycle.
arXiv Detail & Related papers (2024-06-12T07:55:32Z) - A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures [28.604839267949114]
Large Language Models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks.
Research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks.
This paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods.
arXiv Detail & Related papers (2024-06-10T23:54:21Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - Attack Prompt Generation for Red Teaming and Defending Large Language
Models [70.157691818224]
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content.
We propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts.
arXiv Detail & Related papers (2023-10-19T06:15:05Z) - Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks.
We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision.
Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z) - Attacks in Adversarial Machine Learning: A Systematic Survey from the
Life-cycle Perspective [69.25513235556635]
Adversarial machine learning (AML) studies the adversarial phenomenon of machine learning, which may make inconsistent or unexpected predictions with humans.
Some paradigms have been recently developed to explore this adversarial phenomenon occurring at different stages of a machine learning system.
We propose a unified mathematical framework to covering existing attack paradigms.
arXiv Detail & Related papers (2023-02-19T02:12:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.