Certified Causal Defense with Generalizable Robustness
- URL: http://arxiv.org/abs/2408.15451v1
- Date: Wed, 28 Aug 2024 00:14:09 GMT
- Title: Certified Causal Defense with Generalizable Robustness
- Authors: Yiran Qiao, Yu Yin, Chen Chen, Jing Ma,
- Abstract summary: We propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense.
Our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label.
On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors.
- Score: 14.238441767523602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.
Related papers
- FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks [62.897993591443594]
FullCert is the first end-to-end certifier with sound, deterministic bounds.
We experimentally demonstrate FullCert's feasibility on two datasets.
arXiv Detail & Related papers (2024-06-17T13:23:52Z) - Theoretically Principled Trade-off for Stateful Defenses against
Query-Based Black-Box Attacks [26.905553663353825]
We offer a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses.
We analyze the impact of this trade-off on the convergence of black-box attacks.
arXiv Detail & Related papers (2023-07-30T22:31:01Z) - How robust accuracy suffers from certified training with convex
relaxations [12.292092677396347]
Adrial attacks pose significant threats to deploying state-of-the-art classifiers in safety-critical applications.
Two classes of methods have emerged to address this issue: empirical defences and certified defences.
We systematically compare the standard and robust error of these two robust training paradigms across multiple computer vision tasks.
arXiv Detail & Related papers (2023-06-12T09:45:21Z) - Measuring Equality in Machine Learning Security Defenses: A Case Study
in Speech Recognition [56.69875958980474]
This work considers approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations.
We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training.
We present a comparison of equality between two rejection-based defenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups.
arXiv Detail & Related papers (2023-02-17T16:19:26Z) - FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated
Learning [66.56240101249803]
We study how hardening benign clients can affect the global model (and the malicious clients)
We propose a trigger reverse engineering based defense and show that our method can achieve improvement with guarantee robustness.
Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks.
arXiv Detail & Related papers (2022-10-23T22:24:03Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - Adversarial Robustness under Long-Tailed Distribution [93.50792075460336]
Adversarial robustness has attracted extensive studies recently by revealing the vulnerability and intrinsic characteristics of deep networks.
In this work we investigate the adversarial vulnerability as well as defense under long-tailed distributions.
We propose a clean yet effective framework, RoBal, which consists of two dedicated modules, a scale-invariant and data re-balancing.
arXiv Detail & Related papers (2021-04-06T17:53:08Z) - Improving the Certified Robustness of Neural Networks via Consistency
Regularization [25.42238710803711]
A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples.
Most of these provable defense methods treat all examples equally during training process.
In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples.
arXiv Detail & Related papers (2020-12-24T05:00:50Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.