Related papers: A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

URL: http://arxiv.org/abs/2206.08514v1
Date: Fri, 17 Jun 2022 02:29:23 GMT
Title: A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks
Authors: Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong Sun
Abstract summary: We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. We also propose CUBE, a simple yet strong clustering-based defense baseline.
Score: 72.7373468905418
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e.g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing works into three practical scenarios in which attackers release datasets, pre-trained models, and fine-tuned models respectively, then discuss their unique evaluation methodologies. On metrics, to completely evaluate poisoned samples, we use grammar error increase and perplexity difference for stealthiness, along with text similarity for validity. After formalizing the frameworks, we develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. With this toolkit, we perform extensive experiments to benchmark attack and defense models under the suggested paradigm. To facilitate the underexplored defenses against poisoned datasets, we further propose CUBE, a simple yet strong clustering-based defense baseline. We hope that our frameworks and benchmarks could serve as the cornerstones for future model development and evaluations.

Related papers

DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [59.66984417026933]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
Addressing Key Challenges of Adversarial Attacks and Defenses in the Tabular Domain: A Methodological Framework for Coherence and Consistency [26.645723217188323]
In this paper, we propose new evaluation criteria tailored for adversarial attacks in the tabular domain. We also introduce a novel technique for perturbing dependent features while maintaining coherence and feature consistency within the sample. The findings provide valuable insights on the strengths, limitations, and trade-offs of various adversarial attacks in the tabular domain.
arXiv Detail & Related papers (2024-12-10T09:17:09Z)
MIBench: A Comprehensive Benchmark for Model Inversion Attack and Defense [43.71365087852274]
Model Inversion (MI) attacks aim at leveraging the output information of target models to reconstruct privacy-sensitive training data. The lack of a comprehensive, aligned, and reliable benchmark has emerged as a formidable challenge. We introduce the first practical benchmark for model inversion attacks and defenses to address this critical gap, which is named textitMIBench
arXiv Detail & Related papers (2024-10-07T16:13:49Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction [0.8437187555622164]
"MisGUIDE" is a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries.
arXiv Detail & Related papers (2024-03-27T13:59:21Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review [15.179940846141873]
Language Models (LMs) are becoming increasingly popular in real-world applications. Backdoor attacks are a serious threat where malicious behavior is activated when triggers are present. This work aims to provide the NLP community with a timely review of backdoor attacks and countermeasures.
arXiv Detail & Related papers (2023-09-12T08:48:38Z)
Group-based Robustness: A General Framework for Customized Robustness in the Real World [16.376584375681812]
We find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes. We propose a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios. We show that with comparable success rates, finding evasive samples using our new loss functions saves by a factor as large as the number of targeted classes.
arXiv Detail & Related papers (2023-06-29T01:07:12Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework [91.94389491920309]
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples. We set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to exploit the advantages of adversarial attacks.
arXiv Detail & Related papers (2023-05-29T14:55:20Z)
Membership Inference Attacks against Language Models via Neighbourhood Comparison [45.086816556309266]
Membership Inference attacks (MIAs) aim to predict whether a data sample was present in the training data of a machine learning model or not. Recent work has demonstrated that reference-based attacks which compare model scores to those obtained from a reference model trained on similar data can substantially improve the performance of MIAs. We investigate their performance in more realistic scenarios and find that they are highly fragile in relation to the data distribution used to train reference models.
arXiv Detail & Related papers (2023-05-29T07:06:03Z)
A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications. They are vulnerable to adversarial examples, which motivates the adversarial defense. This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z)
Adversarial Attack and Defense of Structured Prediction Models [58.49290114755019]
In this paper, we investigate attacks and defenses for structured prediction tasks in NLP. The structured output of structured prediction models is sensitive to small perturbations in the input. We propose a novel and unified framework that learns to attack a structured prediction model using a sequence-to-sequence model.
arXiv Detail & Related papers (2020-10-04T15:54:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.