A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks
- URL: http://arxiv.org/abs/2206.08514v1
- Date: Fri, 17 Jun 2022 02:29:23 GMT
- Title: A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks
- Authors: Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong
Sun
- Abstract summary: We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
- Score: 72.7373468905418
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Textual backdoor attacks are a kind of practical threat to NLP systems. By
injecting a backdoor in the training phase, the adversary could control model
predictions via predefined triggers. As various attack and defense models have
been proposed, it is of great significance to perform rigorous evaluations.
However, we highlight two issues in previous backdoor learning evaluations: (1)
The differences between real-world scenarios (e.g. releasing poisoned datasets
or models) are neglected, and we argue that each scenario has its own
constraints and concerns, thus requires specific evaluation protocols; (2) The
evaluation metrics only consider whether the attacks could flip the models'
predictions on poisoned samples and retain performances on benign samples, but
ignore that poisoned samples should also be stealthy and semantic-preserving.
To address these issues, we categorize existing works into three practical
scenarios in which attackers release datasets, pre-trained models, and
fine-tuned models respectively, then discuss their unique evaluation
methodologies. On metrics, to completely evaluate poisoned samples, we use
grammar error increase and perplexity difference for stealthiness, along with
text similarity for validity. After formalizing the frameworks, we develop an
open-source toolkit OpenBackdoor to foster the implementations and evaluations
of textual backdoor learning. With this toolkit, we perform extensive
experiments to benchmark attack and defense models under the suggested
paradigm. To facilitate the underexplored defenses against poisoned datasets,
we further propose CUBE, a simple yet strong clustering-based defense baseline.
We hope that our frameworks and benchmarks could serve as the cornerstones for
future model development and evaluations.
Related papers
- MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense [43.71365087852274]
Model Inversion (MI) attacks aim at leveraging the output information of target models to reconstruct privacy-sensitive training data.
The lack of a comprehensive, aligned, and reliable benchmark has emerged as a formidable challenge.
We introduce the first practical benchmark for model inversion attacks and defenses to address this critical gap, which is named textitMIBench
arXiv Detail & Related papers (2024-10-07T16:13:49Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction [0.8437187555622164]
"MisGUIDE" is a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process.
The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries.
arXiv Detail & Related papers (2024-03-27T13:59:21Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Group-based Robustness: A General Framework for Customized Robustness in
the Real World [16.376584375681812]
We find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes.
We propose a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios.
We show that with comparable success rates, finding evasive samples using our new loss functions saves by a factor as large as the number of targeted classes.
arXiv Detail & Related papers (2023-06-29T01:07:12Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - From Adversarial Arms Race to Model-centric Evaluation: Motivating a
Unified Automatic Robustness Evaluation Framework [91.94389491920309]
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs.
The existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples.
We set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to exploit the advantages of adversarial attacks.
arXiv Detail & Related papers (2023-05-29T14:55:20Z) - Membership Inference Attacks against Language Models via Neighbourhood
Comparison [45.086816556309266]
Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not.
Recent work has demonstrated that reference-based attacks which compare model scores to those obtained from a reference model trained on similar data can substantially improve the performance of MIAs.
We investigate their performance in more realistic scenarios and find that they are highly fragile in relation to the data distribution used to train reference models.
arXiv Detail & Related papers (2023-05-29T07:06:03Z) - A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications.
They are vulnerable to adversarial examples, which motivates the adversarial defense.
This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z) - Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP.
The structured output of structured prediction models is sensitive to small perturbations in the input.
We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.