Adversarial Unlearning of Backdoors via Implicit Hypergradient
- URL: http://arxiv.org/abs/2110.03735v1
- Date: Thu, 7 Oct 2021 18:32:54 GMT
- Title: Adversarial Unlearning of Backdoors via Implicit Hypergradient
- Authors: Yi Zeng, Si Chen, Won Park, Z. Morley Mao, Jin Ming and Ruoxi Jia
- Abstract summary: We propose a minimax formulation for removing backdoors from a poisoned model based on a small set of clean data.
We use the Implicit Bacdoor Adversarial Unlearning (I-BAU) algorithm to solve the minimax.
I-BAU's performance is comparable to and most often significantly better than the best baseline.
- Score: 13.496838121707754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a minimax formulation for removing backdoors from a given poisoned
model based on a small set of clean data. This formulation encompasses much of
prior work on backdoor removal. We propose the Implicit Bacdoor Adversarial
Unlearning (I-BAU) algorithm to solve the minimax. Unlike previous work, which
breaks down the minimax into separate inner and outer problems, our algorithm
utilizes the implicit hypergradient to account for the interdependence between
inner and outer optimization. We theoretically analyze its convergence and the
generalizability of the robustness gained by solving minimax on clean data to
unseen test data. In our evaluation, we compare I-BAU with six state-of-art
backdoor defenses on seven backdoor attacks over two datasets and various
attack settings, including the common setting where the attacker targets one
class as well as important but underexplored settings where multiple classes
are targeted. I-BAU's performance is comparable to and most often significantly
better than the best baseline. Particularly, its performance is more robust to
the variation on triggers, attack settings, poison ratio, and clean data size.
Moreover, I-BAU requires less computation to take effect; particularly, it is
more than $13\times$ faster than the most efficient baseline in the
single-target attack setting. Furthermore, it can remain effective in the
extreme case where the defender can only access 100 clean samples -- a setting
where all the baselines fail to produce acceptable results.
Related papers
- UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening [43.09750187130803]
Deep neural networks (DNNs) have demonstrated effectiveness in various fields.
DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label.
In this paper, we introduce a novel post-training defense technique that can effectively eliminate backdoor effects for a variety of attacks.
arXiv Detail & Related papers (2024-07-16T04:33:05Z) - Backdoor Secrets Unveiled: Identifying Backdoor Data with Optimized Scaled Prediction Consistency [15.643978173846095]
Modern machine learning systems are vulnerable to backdoor poisoning attacks.
We propose an automatic identification of backdoor data within a poisoned dataset.
We show that our approach often surpasses the performance of current baselines in identifying backdoor data points.
arXiv Detail & Related papers (2024-03-15T22:35:07Z) - Efficient Backdoor Removal Through Natural Gradient Fine-tuning [4.753323975780736]
Recent backdoor attacks suggest that an adversary can take advantage of such training details and compromise the integrity of a deep neural network (DNN)
Our studies show that a backdoor model is usually optimized to a bad local minima, i.e. sharper minima as compared to a benign model.
We propose a novel backdoor technique, Natural Gradient Fine-tuning (NGF), which focuses on removing the backdoor by fine-tuning only one layer.
arXiv Detail & Related papers (2023-06-30T07:25:38Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks [78.2700757742992]
Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries.
We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different $ell_infty$-robust models and 3 datasets.
Our strongest adversarial attack outperforms all of the white-box components of AutoAttack ensemble.
arXiv Detail & Related papers (2022-12-15T17:44:31Z) - A Large-scale Multiple-objective Method for Black-box Attack against
Object Detection [70.00150794625053]
We propose to minimize the true positive rate and maximize the false positive rate, which can encourage more false positive objects to block the generation of new true positive bounding boxes.
We extend the standard Genetic Algorithm with Random Subset selection and Divide-and-Conquer, called GARSDC, which significantly improves the efficiency.
Compared with the state-of-art attack methods, GARSDC decreases by an average 12.0 in the mAP and queries by about 1000 times in extensive experiments.
arXiv Detail & Related papers (2022-09-16T08:36:42Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Bilateral Dependency Optimization: Defending Against Model-inversion
Attacks [61.78426165008083]
We propose a bilateral dependency optimization (BiDO) strategy to defend against model-inversion attacks.
BiDO achieves the state-of-the-art defense performance for a variety of datasets, classifiers, and MI attacks.
arXiv Detail & Related papers (2022-06-11T10:07:03Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.