RAP: Robustness-Aware Perturbations for Defending against Backdoor
Attacks on NLP Models
- URL: http://arxiv.org/abs/2110.07831v1
- Date: Fri, 15 Oct 2021 03:09:26 GMT
- Title: RAP: Robustness-Aware Perturbations for Defending against Backdoor
Attacks on NLP Models
- Authors: Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun
- Abstract summary: We propose an efficient online defense mechanism based on robustness-aware perturbations.
We construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples.
Our method achieves better defending performance and much lower computational costs than existing online defense methods.
- Score: 29.71136191379715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks, which maliciously control a well-trained model's outputs of
the instances with specific triggers, are recently shown to be serious threats
to the safety of reusing deep neural networks (DNNs). In this work, we propose
an efficient online defense mechanism based on robustness-aware perturbations.
Specifically, by analyzing the backdoor training process, we point out that
there exists a big gap of robustness between poisoned and clean samples.
Motivated by this observation, we construct a word-based robustness-aware
perturbation to distinguish poisoned samples from clean samples to defend
against the backdoor attacks on natural language processing (NLP) models.
Moreover, we give a theoretical analysis about the feasibility of our
robustness-aware perturbation-based defense method. Experimental results on
sentiment analysis and toxic detection tasks show that our method achieves
better defending performance and much lower computational costs than existing
online defense methods. Our code is available at
https://github.com/lancopku/RAP.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios.
Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples.
We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Expose Backdoors on the Way: A Feature-Based Efficient Defense against
Textual Backdoor Attacks [20.531489681650154]
Prior online backdoor defense methods for NLP models only focus on the anomalies at either the input or output level.
We propose a feature-based efficient online defense method that distinguishes poisoned samples from clean samples at the feature level.
arXiv Detail & Related papers (2022-10-14T15:44:28Z) - Backdoor Attack against NLP models with Robustness-Aware Perturbation
defense [0.0]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs)
In our work, we break this defense by controlling the robustness gap between poisoned and clean samples using adversarial training step.
arXiv Detail & Related papers (2022-04-08T10:08:07Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - RAB: Provable Robustness Against Backdoor Attacks [20.702977915926787]
We focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks.
We propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks.
We conduct comprehensive experiments for different machine learning (ML) models and provide the first benchmark for certified robustness against backdoor attacks.
arXiv Detail & Related papers (2020-03-19T17:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.