A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures
- URL: http://arxiv.org/abs/2406.06852v3
- Date: Fri, 19 Jul 2024 08:50:24 GMT
- Title: A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures
- Authors: Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Xiaoyu Xu, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan,
- Abstract summary: Large language models (LLMs) bridge the gap between human language understanding and complex problem-solving.
LLMs are susceptible to potential security vulnerabilities, particularly in backdoor attacks.
This paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods.
- Score: 25.381528717141684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire training process to third-party platforms. However, research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into language models by poisoning training samples or model weights, allowing attackers to manipulate model responses through malicious triggers. While existing surveys on backdoor attacks provide a comprehensive overview, they lack an in-depth examination of backdoor attacks specifically targeting LLMs. To bridge this gap and grasp the latest trends in the field, this paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods. Specifically, we systematically classify backdoor attacks into three categories: full-parameter fine-tuning, parameter-efficient fine-tuning, and attacks without fine-tuning. Based on insights from a substantial review, we also discuss crucial issues for future research on backdoor attacks, such as further exploring attack algorithms that do not require fine-tuning, or developing more covert attack algorithms.
Related papers
- Securing Multi-turn Conversational Language Models Against Distributed Backdoor Triggers [29.554818890832887]
Multi-turn conversational large language models (LLMs) are vulnerable to data poisoning backdoor attacks.
LLMs are at the risk of even more harmful and stealthy backdoor attacks where the backdoor triggers may span across multiple utterances.
We propose a new defense strategy for the multi-turn dialogue setting that is more challenging.
arXiv Detail & Related papers (2024-07-04T20:57:06Z) - Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs.
We modify existing backdoor attacks based on the above key observations.
This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models [16.71019302192829]
Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP)
Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized.
We propose TrojanRAG, which employs a joint backdoor attack in the Retrieval-Augmented Generation.
arXiv Detail & Related papers (2024-05-22T07:21:32Z) - Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning [63.481446315733145]
Our research focuses on cross-lingual backdoor attacks against multilingual models.
We investigate how poisoning the instruction-tuning data in one or two languages can affect the outputs in languages whose instruction-tuning data was not poisoned.
Our method exhibits remarkable efficacy in models like mT5, BLOOM, and GPT-3.5-turbo, with high attack success rates, surpassing 95% in several languages.
arXiv Detail & Related papers (2024-04-30T14:43:57Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review [15.179940846141873]
Applicating third-party data and models has become a new paradigm for language modeling in NLP.
backdoor attacks can induce the model to exhibit expected behaviors through specific triggers.
There is still no systematic and comprehensive review to reflect the security challenges, attacker's capabilities, and purposes.
arXiv Detail & Related papers (2023-09-12T08:48:38Z) - A Comprehensive Overview of Backdoor Attacks in Large Language Models within Communication Networks [28.1095109118807]
Large Language Models (LLMs) are poised to offer efficient and intelligent services for future mobile communication networks.
LLMs may be exposed to maliciously manipulated training data and processing, providing an opportunity for attackers to embed a hidden backdoor into the model.
Backdoor attacks are particularly concerning within communication networks where reliability and security are paramount.
arXiv Detail & Related papers (2023-08-28T07:31:43Z) - A Survey on Backdoor Attack and Defense in Natural Language Processing [18.29835890570319]
We conduct a comprehensive review of backdoor attacks and defenses in the field of NLP.
We summarize benchmark datasets and point out the open issues to design credible systems to defend against backdoor attacks.
arXiv Detail & Related papers (2022-11-22T02:35:12Z) - Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word
Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.
Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated.
We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z) - Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
Backdoor learning is an emerging and rapidly growing research area.
This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.