Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks
- URL: http://arxiv.org/abs/2102.11010v1
- Date: Mon, 22 Feb 2021 14:07:24 GMT
- Title: Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks
- Authors: Ginevra Carbone, Guido Sanguinetti, Luca Bortolussi
- Abstract summary: We show that for deterministic Neural Networks, saliency interpretations are remarkably brittle even when the attacks fail.
We suggest and demonstrate empirically that saliency explanations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations.
- Score: 3.222802562733787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of the stability of saliency-based explanations of
Neural Network predictions under adversarial attacks in a classification task.
We empirically show that, for deterministic Neural Networks, saliency
interpretations are remarkably brittle even when the attacks fail, i.e. for
attacks that do not change the classification label. By leveraging recent
results, we provide a theoretical explanation of this result in terms of the
geometry of adversarial attacks. Based on these theoretical considerations, we
suggest and demonstrate empirically that saliency explanations provided by
Bayesian Neural Networks are considerably more stable under adversarial
perturbations. Our results not only confirm that Bayesian Neural Networks are
more robust to adversarial attacks, but also demonstrate that Bayesian methods
have the potential to provide more stable and interpretable assessments of
Neural Network predictions.
Related papers
- Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis [25.993502776271022]
Having a large parameter space is considered one of the main suspects of the neural networks' vulnerability to adversarial example.
Previous research has demonstrated that depending on the considered model, the algorithm employed to generate adversarial examples may not function properly.
arXiv Detail & Related papers (2024-06-14T14:47:06Z) - Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks [10.317475068017961]
We investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines.
We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks.
arXiv Detail & Related papers (2024-04-27T01:34:46Z) - Quantum-Inspired Analysis of Neural Network Vulnerabilities: The Role of
Conjugate Variables in System Attacks [54.565579874913816]
Neural networks demonstrate inherent vulnerability to small, non-random perturbations, emerging as adversarial attacks.
A mathematical congruence manifests between this mechanism and the quantum physics' uncertainty principle, casting light on a hitherto unanticipated interdisciplinarity.
arXiv Detail & Related papers (2024-02-16T02:11:27Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks.
We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels.
We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z) - Robustness against Adversarial Attacks in Neural Networks using
Incremental Dissipativity [3.8673567847548114]
Adversarial examples can easily degrade the classification performance in neural networks.
This work proposes an incremental dissipativity-based robustness certificate for neural networks.
arXiv Detail & Related papers (2021-11-25T04:42:57Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Evaluating the Robustness of Bayesian Neural Networks Against Different
Types of Attacks [2.599882743586164]
We show that a Bayesian neural network achieves significantly higher robustness against adversarial attacks generated against a deterministic neural network model.
The posterior can act as the safety precursor of ongoing malicious activities.
This advises on utilizing layers in building decision-making pipelines within a safety-critical domain.
arXiv Detail & Related papers (2021-06-17T03:18:59Z) - Neural Networks with Recurrent Generative Feedback [61.90658210112138]
We instantiate this design on convolutional neural networks (CNNs)
In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.
arXiv Detail & Related papers (2020-07-17T19:32:48Z) - Proper Network Interpretability Helps Adversarial Robustness in
Classification [91.39031895064223]
We show that with a proper measurement of interpretation, it is difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy.
We develop an interpretability-aware defensive scheme built only on promoting robust interpretation.
We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation.
arXiv Detail & Related papers (2020-06-26T01:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.