Adversarial Unlearning of Backdoors via Implicit Hypergradient
- URL: http://arxiv.org/abs/2110.03735v1
- Date: Thu, 7 Oct 2021 18:32:54 GMT
- Title: Adversarial Unlearning of Backdoors via Implicit Hypergradient
- Authors: Yi Zeng, Si Chen, Won Park, Z. Morley Mao, Jin Ming and Ruoxi Jia
- Abstract summary: We propose a minimax formulation for removing backdoors from a poisoned model based on a small set of clean data.
We use the Implicit Bacdoor Adversarial Unlearning (I-BAU) algorithm to solve the minimax.
I-BAU's performance is comparable to and most often significantly better than the best baseline.
- Score: 13.496838121707754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a minimax formulation for removing backdoors from a given poisoned
model based on a small set of clean data. This formulation encompasses much of
prior work on backdoor removal. We propose the Implicit Bacdoor Adversarial
Unlearning (I-BAU) algorithm to solve the minimax. Unlike previous work, which
breaks down the minimax into separate inner and outer problems, our algorithm
utilizes the implicit hypergradient to account for the interdependence between
inner and outer optimization. We theoretically analyze its convergence and the
generalizability of the robustness gained by solving minimax on clean data to
unseen test data. In our evaluation, we compare I-BAU with six state-of-art
backdoor defenses on seven backdoor attacks over two datasets and various
attack settings, including the common setting where the attacker targets one
class as well as important but underexplored settings where multiple classes
are targeted. I-BAU's performance is comparable to and most often significantly
better than the best baseline. Particularly, its performance is more robust to
the variation on triggers, attack settings, poison ratio, and clean data size.
Moreover, I-BAU requires less computation to take effect; particularly, it is
more than $13\times$ faster than the most efficient baseline in the
single-target attack setting. Furthermore, it can remain effective in the
extreme case where the defender can only access 100 clean samples -- a setting
where all the baselines fail to produce acceptable results.
Related papers
- Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture.
We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses.
Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z) - FLARE: Towards Universal Dataset Purification against Backdoor Attacks [16.97677097266535]
Deep neural networks (DNNs) are susceptible to backdoor attacks.
adversaries poison datasets with adversary-specified triggers to implant hidden backdoors.
We propose FLARE, a universal purification method to counter various backdoor attacks.
arXiv Detail & Related papers (2024-11-29T05:34:21Z) - LADDER: Multi-objective Backdoor Attack via Evolutionary Algorithm [11.95174457001938]
This work proposes a multiobjective blackbox backdoor attack in dual domains via evolutionary algorithm (LADDER)
In particular, we formulate LADDER as a multi-objective optimization problem (MOP) and solve it via multi-objective evolutionary algorithm (MOEA)
Experiments comprehensively showcase that LADDER attack effectiveness of at least 99%, attack robustness with 90.23%, superior natural stealthiness (1.12x to 196.74x improvement) and excellent spectral stealthiness (8.45x enhancement) as compared to current stealthy attacks by the average $l$-norm across 5 public datasets.
arXiv Detail & Related papers (2024-11-28T11:50:23Z) - CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning [53.766434746801366]
We propose a fine-grained textbfText textbfAlignment textbfCleaner (TA-Cleaner) to cut off feature connections of backdoor triggers.
TA-Cleaner achieves state-of-the-art defensiveness among finetuning-based defense techniques.
arXiv Detail & Related papers (2024-09-26T07:35:23Z) - UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening [43.09750187130803]
Deep neural networks (DNNs) have demonstrated effectiveness in various fields.
DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label.
In this paper, we introduce a novel post-training defense technique that can effectively eliminate backdoor effects for a variety of attacks.
arXiv Detail & Related papers (2024-07-16T04:33:05Z) - Efficient Backdoor Removal Through Natural Gradient Fine-tuning [4.753323975780736]
Recent backdoor attacks suggest that an adversary can take advantage of such training details and compromise the integrity of a deep neural network (DNN)
Our studies show that a backdoor model is usually optimized to a bad local minima, i.e. sharper minima as compared to a benign model.
We propose a novel backdoor technique, Natural Gradient Fine-tuning (NGF), which focuses on removing the backdoor by fine-tuning only one layer.
arXiv Detail & Related papers (2023-06-30T07:25:38Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - A Large-scale Multiple-objective Method for Black-box Attack against
Object Detection [70.00150794625053]
We propose to minimize the true positive rate and maximize the false positive rate, which can encourage more false positive objects to block the generation of new true positive bounding boxes.
We extend the standard Genetic Algorithm with Random Subset selection and Divide-and-Conquer, called GARSDC, which significantly improves the efficiency.
Compared with the state-of-art attack methods, GARSDC decreases by an average 12.0 in the mAP and queries by about 1000 times in extensive experiments.
arXiv Detail & Related papers (2022-09-16T08:36:42Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Bilateral Dependency Optimization: Defending Against Model-inversion
Attacks [61.78426165008083]
We propose a bilateral dependency optimization (BiDO) strategy to defend against model-inversion attacks.
BiDO achieves the state-of-the-art defense performance for a variety of datasets, classifiers, and MI attacks.
arXiv Detail & Related papers (2022-06-11T10:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.