On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient
Shaping
- URL: http://arxiv.org/abs/2002.11497v2
- Date: Thu, 27 Feb 2020 19:00:01 GMT
- Title: On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient
Shaping
- Authors: Sanghyun Hong, Varun Chandrasekaran, Yi\u{g}itcan Kaya, Tudor
Dumitra\c{s}, Nicolas Papernot
- Abstract summary: Machine learning algorithms are vulnerable to data poisoning attacks.
We study the feasibility of an attack-agnostic defense relying on artifacts common to all poisoning attacks.
- Score: 36.41173109033075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning algorithms are vulnerable to data poisoning attacks. Prior
taxonomies that focus on specific scenarios, e.g., indiscriminate or targeted,
have enabled defenses for the corresponding subset of known attacks. Yet, this
introduces an inevitable arms race between adversaries and defenders. In this
work, we study the feasibility of an attack-agnostic defense relying on
artifacts that are common to all poisoning attacks. Specifically, we focus on a
common element between all attacks: they modify gradients computed to train the
model. We identify two main artifacts of gradients computed in the presence of
poison: (1) their $\ell_2$ norms have significantly higher magnitudes than
those of clean gradients, and (2) their orientation differs from clean
gradients. Based on these observations, we propose the prerequisite for a
generic poisoning defense: it must bound gradient magnitudes and minimize
differences in orientation. We call this gradient shaping. As an exemplar tool
to evaluate the feasibility of gradient shaping, we use differentially private
stochastic gradient descent (DP-SGD), which clips and perturbs individual
gradients during training to obtain privacy guarantees. We find that DP-SGD,
even in configurations that do not result in meaningful privacy guarantees,
increases the model's robustness to indiscriminate attacks. It also mitigates
worst-case targeted attacks and increases the adversary's cost in multi-poison
scenarios. The only attack we find DP-SGD to be ineffective against is a
strong, yet unrealistic, indiscriminate attack. Our results suggest that, while
we currently lack a generic poisoning defense, gradient shaping is a promising
direction for future research.
Related papers
- Inverting Gradient Attacks Naturally Makes Data Poisons: An Availability Attack on Neural Networks [12.80649024603656]
Gradient attacks and data poisoning with machine learning algorithms to alter them have been proven to be equivalent in settings.
We show how data poisoning can mimic a gradient attack to perform an attack on neural networks.
arXiv Detail & Related papers (2024-10-28T18:57:15Z) - DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples.
Recent studies show that even advanced attacks cannot break such defenses effectively.
We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z) - IDEA: Invariant Defense for Graph Adversarial Robustness [60.0126873387533]
We propose an Invariant causal DEfense method against adversarial Attacks (IDEA)
We derive node-based and structure-based invariance objectives from an information-theoretic perspective.
Experiments demonstrate that IDEA attains state-of-the-art defense performance under all five attacks on all five datasets.
arXiv Detail & Related papers (2023-05-25T07:16:00Z) - Towards Reasonable Budget Allocation in Untargeted Graph Structure
Attacks via Gradient Debias [50.628150015907565]
Cross-entropy loss function is used to evaluate perturbation schemes in classification tasks.
Previous methods use negative cross-entropy loss as the attack objective in attacking node-level classification models.
This paper argues about the previous unreasonable attack objective from the perspective of budget allocation.
arXiv Detail & Related papers (2023-03-29T13:02:02Z) - Not All Poisons are Created Equal: Robust Training against Data
Poisoning [15.761683760167777]
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data.
We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks.
arXiv Detail & Related papers (2022-10-18T08:19:41Z) - Defending against the Label-flipping Attack in Federated Learning [5.769445676575767]
Federated learning (FL) provides autonomy and privacy by design to participating peers.
The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples.
We propose a novel defense that first dynamically extracts those gradients from the peers' local updates.
arXiv Detail & Related papers (2022-07-05T12:02:54Z) - Adversarially Robust Classification by Conditional Generative Model
Inversion [4.913248451323163]
We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack.
Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images.
We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks.
arXiv Detail & Related papers (2022-01-12T23:11:16Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.