Related papers: ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models

ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models

URL: http://arxiv.org/abs/2502.18511v1
Date: Sat, 22 Feb 2025 12:55:28 GMT
Title: ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
Authors: Xuxu Liu, Siyuan Liang, Mengya Han, Yong Luo, Aishan Liu, Xiantao Cai, Zheng He, Dacheng Tao,
Abstract summary: Generative large language models are vulnerable to backdoor attacks.<n>$textitELBA-Bench$ allows attackers to inject backdoor through parameter efficient fine-tuning.<n>$textitELBA-Bench$ provides over 1300 experiments.
Score: 55.93380086403591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative large language models are crucial in natural language processing, but they are vulnerable to backdoor attacks, where subtle triggers compromise their behavior. Although backdoor attacks against LLMs are constantly emerging, existing benchmarks remain limited in terms of sufficient coverage of attack, metric system integrity, backdoor attack alignment. And existing pre-trained backdoor attacks are idealized in practice due to resource access constraints. Therefore we establish $\textit{ELBA-Bench}$, a comprehensive and unified framework that allows attackers to inject backdoor through parameter efficient fine-tuning ($\textit{e.g.,}$ LoRA) or without fine-tuning techniques ($\textit{e.g.,}$ In-context-learning). $\textit{ELBA-Bench}$ provides over 1300 experiments encompassing the implementations of 12 attack methods, 18 datasets, and 12 LLMs. Extensive experiments provide new invaluable findings into the strengths and limitations of various attack strategies. For instance, PEFT attack consistently outperform without fine-tuning approaches in classification tasks while showing strong cross-dataset generalization with optimized triggers boosting robustness; Task-relevant backdoor optimization techniques or attack prompts along with clean and adversarial demonstrations can enhance backdoor attack success while preserving model performance on clean samples. Additionally, we introduce a universal toolbox designed for standardized backdoor attack research, with the goal of propelling further progress in this vital area.

Related papers

Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models [42.81731204702258]
Class-wise Backdoor Prompt Tuning (CBPT) is an efficient and effective method that operates on the text prompts to indirectly purify poisoned Vision-Language Models (VLMs)<n>CBPT significantly mitigates backdoor threats while preserving model utility, e.g. an average Clean Accuracy (CA) of 58.86% and an Attack Success Rate (ASR) of 0.39% across seven mainstream backdoor attacks.
arXiv Detail & Related papers (2025-02-26T16:25:15Z)
Weak-to-Strong Backdoor Attack for Large Language Models [15.055037707091435]
We propose a novel backdoor attack algorithm from weak to strong based on feature alignment-enhanced knowledge distillation (W2SAttack) We demonstrate the superior performance of W2SAttack on classification tasks across four language models, four backdoor attack algorithms, and two different architectures of teacher models.
arXiv Detail & Related papers (2024-09-26T15:20:37Z)
CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning [53.766434746801366]
We propose a fine-grained textbfText textbfAlignment textbfCleaner (TA-Cleaner) to cut off feature connections of backdoor triggers. TA-Cleaner achieves state-of-the-art defensiveness among finetuning-based defense techniques.
arXiv Detail & Related papers (2024-09-26T07:35:23Z)
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models [27.59116619946915]
We introduce textitBackdoorLLM, the first comprehensive benchmark for studying backdoor attacks on Generative Large Language Models. textitBackdoorLLM features: 1) a repository of backdoor benchmarks with a standardized training pipeline, 2) diverse attack strategies, including data poisoning, weight poisoning, hidden state attacks, and chain-of-thought attacks, and 3) extensive evaluations with over 200 experiments on 8 attacks across 7 scenarios and 6 model architectures.
arXiv Detail & Related papers (2024-08-23T02:21:21Z)
Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks [10.26810397377592]
We introduce the Efficient and Stealthy Textual backdoor attack method, EST-Bad, leveraging Large Language Models (LLMs) Our EST-Bad encompasses three core strategies: optimizing the inherent flaw of models as the trigger, stealthily injecting triggers with LLMs, and meticulously selecting the most impactful samples for backdoor injection.
arXiv Detail & Related papers (2024-08-21T12:50:23Z)
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains.<n>We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness.<n>We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z)
A Survey of Recent Backdoor Attacks and Defenses in Large Language Models [28.604839267949114]
Large Language Models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks.<n>Research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks.<n>This paper presents a novel perspective on backdoor attacks for LLMs by focusing on fine-tuning methods.
arXiv Detail & Related papers (2024-06-10T23:54:21Z)
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning [49.174341192722615]
Backdoor attack poses a significant security threat to Deep Learning applications. Recent papers have introduced attacks using sample-specific invisible triggers crafted through special transformation functions. We introduce a novel backdoor attack LOTUS to address both evasiveness and resilience.
arXiv Detail & Related papers (2024-03-25T21:01:29Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.