CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense
- URL: http://arxiv.org/abs/2410.23091v6
- Date: Tue, 07 Jan 2025 13:44:57 GMT
- Title: CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense
- Authors: Mingkun Zhang, Keping Bi, Wei Chen, Quanrun Chen, Jiafeng Guo, Xueqi Cheng,
- Abstract summary: Humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors.
Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation.
For an adversarial example, we aim to discriminate perturbations as non-causative factors and make predictions only based on the label-causative factors.
- Score: 61.78357530675446
- License:
- Abstract: Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark). The code is available at https://github.com/CAS-AISafetyBasicResearchGroup/CausalDiff.
Related papers
- Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied.
We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables.
To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z) - Unraveling Adversarial Examples against Speaker Identification --
Techniques for Attack Detection and Victim Model Classification [24.501269108193412]
Adversarial examples have proven to threaten speaker identification systems.
We propose a method to detect the presence of adversarial examples.
We also introduce a method for identifying the victim model on which the adversarial attack is carried out.
arXiv Detail & Related papers (2024-02-29T17:06:52Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Defending against the Label-flipping Attack in Federated Learning [5.769445676575767]
Federated learning (FL) provides autonomy and privacy by design to participating peers.
The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples.
We propose a novel defense that first dynamically extracts those gradients from the peers' local updates.
arXiv Detail & Related papers (2022-07-05T12:02:54Z) - Semi-Targeted Model Poisoning Attack on Federated Learning via Backward
Error Analysis [15.172954465350667]
Model poisoning attacks on federated learning (FL) intrude in the entire system via compromising an edge model.
We propose the Attacking Distance-aware Attack (ADA) to enhance a poisoning attack by finding the optimized target class in the feature space.
ADA succeeded in increasing the attack performance by 1.8 times in the most challenging case with an attacking frequency of 0.01.
arXiv Detail & Related papers (2022-03-22T11:40:07Z) - Feature Importance-aware Transferable Adversarial Attacks [46.12026564065764]
Existing transferable attacks tend to craft adversarial examples by indiscriminately distorting features.
We argue that such brute-force degradation would introduce model-specific local optimum into adversarial examples.
By contrast, we propose the Feature Importance-aware Attack (FIA), which disrupts important object-aware features.
arXiv Detail & Related papers (2021-07-29T17:13:29Z) - Adversarial Robustness through the Lens of Causality [105.51753064807014]
adversarial vulnerability of deep neural networks has attracted significant attention in machine learning.
We propose to incorporate causality into mitigating adversarial vulnerability.
Our method can be seen as the first attempt to leverage causality for mitigating adversarial vulnerability.
arXiv Detail & Related papers (2021-06-11T06:55:02Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.