Related papers: Flatness-aware Sequential Learning Generates Resilient Backdoors

Flatness-aware Sequential Learning Generates Resilient Backdoors

URL: http://arxiv.org/abs/2407.14738v1
Date: Sat, 20 Jul 2024 03:30:05 GMT
Title: Flatness-aware Sequential Learning Generates Resilient Backdoors
Authors: Hoang Pham, The-Anh Ta, Anh Tran, Khoa D. Doan,
Abstract summary: Recently, backdoor attacks have become an emerging threat to the security of machine learning models. This paper counters CF of backdoors by leveraging continual learning (CL) techniques. We propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors.
Score: 7.969181278996343
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Recently, backdoor attacks have become an emerging threat to the security of machine learning models. From the adversary's perspective, the implanted backdoors should be resistant to defensive algorithms, but some recently proposed fine-tuning defenses can remove these backdoors with notable efficacy. This is mainly due to the catastrophic forgetting (CF) property of deep neural networks. This paper counters CF of backdoors by leveraging continual learning (CL) techniques. We begin by investigating the connectivity between a backdoored and fine-tuned model in the loss landscape. Our analysis confirms that fine-tuning defenses, especially the more advanced ones, can easily push a poisoned model out of the backdoor regions, making it forget all about the backdoors. Based on this finding, we re-formulate backdoor training through the lens of CL and propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors. This framework separates the backdoor poisoning process into two tasks: the first task learns a backdoored model, while the second task, based on the CL principles, moves it to a backdoored region resistant to fine-tuning. We additionally propose to seek flatter backdoor regions via a sharpness-aware minimizer in the framework, further strengthening the durability of the implanted backdoor. Finally, we demonstrate the effectiveness of our method through extensive empirical experiments on several benchmark datasets in the backdoor domain. The source code is available at https://github.com/mail-research/SBL-resilient-backdoors

Related papers

Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models [9.995807326278959]
We propose a novel defense method called Backdoor Token Unlearning (BTU), which proactively detects and neutralizes trigger tokens during the training stage. Our work is based on two key findings: 1) backdoor learning causes distinctive differences between backdoor token parameters and clean token parameters in word embedding layers, and 2) the success of backdoor attacks heavily depends on backdoor token parameters.
arXiv Detail & Related papers (2025-01-05T03:22:13Z)
Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models [68.40324627475499]
We introduce a novel two-step defense framework named Expose Before You Defend. EBYD unifies existing backdoor defense methods into a comprehensive defense system with enhanced performance. We conduct extensive experiments on 10 image attacks and 6 text attacks across 2 vision datasets and 4 language datasets.
arXiv Detail & Related papers (2024-10-25T09:36:04Z)
BAN: Detecting Backdoors Activated by Adversarial Neuron Noise [30.243702765232083]
Backdoor attacks on deep learning represent a recent threat that has gained significant attention in the research community. Backdoor defenses are mainly based on backdoor inversion, which has been shown to be generic, model-agnostic, and applicable to practical threat scenarios. This paper improves backdoor feature inversion for backdoor detection by incorporating extra neuron activation information.
arXiv Detail & Related papers (2024-05-30T10:44:45Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection [42.021282816470794]
We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs) Our defense falls within the category of post-development defenses that operate independently of how the model was generated. We show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference.
arXiv Detail & Related papers (2023-08-23T21:47:06Z)
Single Image Backdoor Inversion via Robust Smoothed Classifiers [76.66635991456336]
We present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image. In this work, we present a new approach for backdoor inversion, which is able to recover the hidden backdoor with as few as a single image.
arXiv Detail & Related papers (2023-03-01T03:37:42Z)
Anti-Backdoor Learning: Training Clean Models on Poisoned Data [17.648453598314795]
Backdoor attack has emerged as a major security threat to deep neural networks (DNNs) We introduce the concept of emphanti-backdoor learning, aiming to train emphclean models given backdoor-poisoned data. We empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data.
arXiv Detail & Related papers (2021-10-22T03:30:48Z)
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated. We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z)
Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) Backdoor learning is an emerging and rapidly growing research area. This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning [21.06925263586183]
Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We show how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness.
arXiv Detail & Related papers (2020-07-09T21:50:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.