DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with
Differentially Private Data Augmentations
- URL: http://arxiv.org/abs/2103.02079v1
- Date: Tue, 2 Mar 2021 23:07:31 GMT
- Title: DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with
Differentially Private Data Augmentations
- Authors: Eitan Borgnia, Jonas Geiping, Valeriia Cherepanova, Liam Fowl, Arjun
Gupta, Amin Ghiasi, Furong Huang, Micah Goldblum, Tom Goldstein
- Abstract summary: We show that strong data augmentations, such as mixup and random additive noise, nullify poison attacks while enduring only a small accuracy trade-off.
A rigorous analysis of DP-InstaHide shows that mixup does indeed have privacy advantages, and that training with k-way mixup provably yields at least k times stronger DP guarantees than a naive DP mechanism.
- Score: 54.960853673256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data poisoning and backdoor attacks manipulate training data to induce
security breaches in a victim model. These attacks can be provably deflected
using differentially private (DP) training methods, although this comes with a
sharp decrease in model performance. The InstaHide method has recently been
proposed as an alternative to DP training that leverages supposed privacy
properties of the mixup augmentation, although without rigorous guarantees. In
this work, we show that strong data augmentations, such as mixup and random
additive noise, nullify poison attacks while enduring only a small accuracy
trade-off. To explain these finding, we propose a training method,
DP-InstaHide, which combines the mixup regularizer with additive noise. A
rigorous analysis of DP-InstaHide shows that mixup does indeed have privacy
advantages, and that training with k-way mixup provably yields at least k times
stronger DP guarantees than a naive DP mechanism. Because mixup (as opposed to
noise) is beneficial to model performance, DP-InstaHide provides a mechanism
for achieving stronger empirical performance against poisoning attacks than
other known DP methods.
Related papers
- To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling [25.669347036509134]
We analyze Differentially Private Gradient Descent (DP-SGD) with shuffling.
We show that state-of-the-art DP models trained with shuffling appreciably overestimated privacy guarantees (up to 4x)
Our work empirically attests to the risk of using shuffling instead of Poisson sub-sampling vis-a-vis the actual privacy leakage of DP-SGD.
arXiv Detail & Related papers (2024-11-15T22:34:28Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Rethinking Improved Privacy-Utility Trade-off with Pre-existing Knowledge for DP Training [31.559864332056648]
We propose a generic differential privacy framework with heterogeneous noise (DP-Hero)
Atop DP-Hero, we instantiate a heterogeneous version of DP-SGD, where the noise injected into gradient updates is heterogeneous and guided by prior-established model parameters.
We conduct comprehensive experiments to verify and explain the effectiveness of the proposed DP-Hero, showing improved training accuracy compared with state-of-the-art works.
arXiv Detail & Related papers (2024-09-05T08:40:54Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Closed-Form Bounds for DP-SGD against Record-level Inference [18.85865832127335]
We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds.
We obtain bounds for membership inference that match state-of-the-art techniques.
We present a novel data-dependent bound against attribute inference.
arXiv Detail & Related papers (2024-02-22T09:26:16Z) - Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation [120.42853706967188]
We explore the potential backdoor attacks on model adaptation launched by well-designed poisoning target data.
We propose a plug-and-play method named MixAdapt, combining it with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Does Differential Privacy Prevent Backdoor Attacks in Practice? [8.951356689083166]
We investigate the effectiveness of Differential Privacy techniques in preventing backdoor attacks in machine learning models.
We propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE.
arXiv Detail & Related papers (2023-11-10T18:32:08Z) - Bounding Training Data Reconstruction in DP-SGD [42.36933026300976]
Differentially private training offers a protection which is usually interpreted as a guarantee against membership inference attacks.
By proxy, this guarantee extends to other threats like reconstruction attacks attempting to extract complete training examples.
Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against training data reconstruction, then utility of private models can be improved.
arXiv Detail & Related papers (2023-02-14T18:02:34Z) - FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated
Learning [66.56240101249803]
We study how hardening benign clients can affect the global model (and the malicious clients)
We propose a trigger reverse engineering based defense and show that our method can achieve improvement with guarantee robustness.
Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks.
arXiv Detail & Related papers (2022-10-23T22:24:03Z) - Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks
Without an Accuracy Tradeoff [57.35978884015093]
We show that strong data augmentations, such as CutMix, can significantly diminish the threat of poisoning and backdoor attacks without trading off performance.
In the context of backdoors, CutMix greatly mitigates the attack while simultaneously increasing validation accuracy by 9%.
arXiv Detail & Related papers (2020-11-18T20:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.