A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
- URL: http://arxiv.org/abs/2312.10982v1
- Date: Mon, 18 Dec 2023 07:07:32 GMT
- Title: A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
- Authors: Aysan Esmradi, Daniel Wankit Yip, Chun Fai Chan,
- Abstract summary: This article explores two attack categories: attacks on models themselves and attacks on model applications.
The former requires expertise, access to model data, and significant implementation time.
The latter is more accessible to attackers and has seen increased attention.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Ensuring the security of large language models (LLMs) is an ongoing challenge despite their widespread popularity. Developers work to enhance LLMs security, but vulnerabilities persist, even in advanced versions like GPT-4. Attackers exploit these weaknesses, highlighting the need for proactive cybersecurity measures in AI model development. This article explores two attack categories: attacks on models themselves and attacks on model applications. The former requires expertise, access to model data, and significant implementation time, while the latter is more accessible to attackers and has seen increased attention. Our study reviews over 100 recent research works, providing an in-depth analysis of each attack type. We identify the latest attack methods and explore various approaches to carry them out. We thoroughly investigate mitigation techniques, assessing their effectiveness and limitations. Furthermore, we summarize future defenses against these attacks. We also examine real-world techniques, including reported and our implemented attacks on LLMs, to consolidate our findings. Our research highlights the urgency of addressing security concerns and aims to enhance the understanding of LLM attacks, contributing to robust defense development in this evolving domain.
Related papers
- The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense [56.32083100401117]
We investigate why Vision Large Language Models (VLLMs) are prone to jailbreak attacks.
We then make a key observation: existing defense mechanisms suffer from an textbfover-prudence problem.
We find that the two representative evaluation methods for jailbreak often exhibit chance agreement.
arXiv Detail & Related papers (2024-11-13T07:57:19Z) - New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook [54.24701201956833]
Security and privacy issues have undermined users' confidence in pre-trained models.
Current literature lacks a clear taxonomy of emerging attacks and defenses for pre-trained models.
This taxonomy categorizes attacks and defenses into No-Change, Input-Change, and Model-Change approaches.
arXiv Detail & Related papers (2024-11-12T10:15:33Z) - Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations [0.0]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, but their vulnerability to jailbreak attacks poses significant security risks.
This survey paper presents a comprehensive analysis of recent advancements in attack strategies and defense mechanisms within the field of Large Language Model (LLM) red-teaming.
arXiv Detail & Related papers (2024-10-09T01:35:38Z) - Recent Advances in Attack and Defense Approaches of Large Language Models [27.271665614205034]
Large Language Models (LLMs) have revolutionized artificial intelligence and machine learning through their advanced text processing and generating capabilities.
Their widespread deployment has raised significant safety and reliability concerns.
This paper reviews current research on LLM vulnerabilities and threats, and evaluates the effectiveness of contemporary defense mechanisms.
arXiv Detail & Related papers (2024-09-05T06:31:37Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - Purple-teaming LLMs with Adversarial Defender Training [57.535241000787416]
We present Purple-teaming LLMs with Adversarial Defender training (PAD)
PAD is a pipeline designed to safeguard LLMs by novelly incorporating the red-teaming (attack) and blue-teaming (safety training) techniques.
PAD significantly outperforms existing baselines in both finding effective attacks and establishing a robust safe guardrail.
arXiv Detail & Related papers (2024-07-01T23:25:30Z) - Exploring Vulnerabilities and Protections in Large Language Models: A Survey [1.6179784294541053]
This survey examines the security challenges of Large Language Models (LLMs)
It focuses on two main areas: Prompt Hacking and Adversarial Attacks.
By detailing these security issues, the survey contributes to the broader discussion on creating resilient AI systems.
arXiv Detail & Related papers (2024-06-01T00:11:09Z) - Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models [18.624280305864804]
Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP)
This paper presents a comprehensive survey of the various forms of attacks targeting LLMs.
We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation.
arXiv Detail & Related papers (2024-03-03T04:46:21Z) - A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models [20.40158210837289]
We investigate nine attack techniques and seven defense techniques applied across three distinct language models: Vicuna, LLama, and GPT-3.5 Turbo.
Our findings reveal that existing white-box attacks underperform compared to universal techniques and that including special tokens in the input significantly affects the likelihood of successful attacks.
arXiv Detail & Related papers (2024-02-21T01:26:39Z) - Towards more Practical Threat Models in Artificial Intelligence Security [66.67624011455423]
Recent works have identified a gap between research and practice in artificial intelligence security.
We revisit the threat models of the six most studied attacks in AI security research and match them to AI usage in practice.
arXiv Detail & Related papers (2023-11-16T16:09:44Z) - Challenges and Countermeasures for Adversarial Attacks on Deep
Reinforcement Learning [48.49658986576776]
Deep Reinforcement Learning (DRL) has numerous applications in the real world thanks to its outstanding ability in adapting to the surrounding environments.
Despite its great advantages, DRL is susceptible to adversarial attacks, which precludes its use in real-life critical systems and applications.
This paper presents emerging attacks in DRL-based systems and the potential countermeasures to defend against these attacks.
arXiv Detail & Related papers (2020-01-27T10:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.