Related papers: RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

URL: http://arxiv.org/abs/2110.07831v1
Date: Fri, 15 Oct 2021 03:09:26 GMT
Title: RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
Authors: Wenkai Yang, Yankai Lin, Peng Li, Jie Zhou, Xu Sun
Abstract summary: We propose an efficient online defense mechanism based on robustness-aware perturbations. We construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples. Our method achieves better defending performance and much lower computational costs than existing online defense methods.
Score: 29.71136191379715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/lancopku/RAP.

Related papers

Reformulation is All You Need: Addressing Malicious Text Features in DNNs [43.978490178352935]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks. Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z)
Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization [38.957943962546864]
We propose to train one model using the Sharpness-Aware Minimization (SAM) algorithm, rather than the vanilla training algorithm. Extensive experiments on several benchmark datasets show the reliable detection performance of the proposed method against both weak and strong backdoor attacks.
arXiv Detail & Related papers (2024-11-18T12:35:08Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Confidence-driven Sampling for Backdoor Attacks [49.72680157684523]
Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks.
arXiv Detail & Related papers (2023-10-08T18:57:36Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks [20.531489681650154]
Prior online backdoor defense methods for NLP models only focus on the anomalies at either the input or output level. We propose a feature-based efficient online defense method that distinguishes poisoned samples from clean samples at the feature level.
arXiv Detail & Related papers (2022-10-14T15:44:28Z)
Backdoor Attack against NLP models with Robustness-Aware Perturbation defense [0.0]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) In our work, we break this defense by controlling the robustness gap between poisoned and clean samples using adversarial training step.
arXiv Detail & Related papers (2022-04-08T10:08:07Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.