Related papers: Verifying the Causes of Adversarial Examples

Verifying the Causes of Adversarial Examples

URL: http://arxiv.org/abs/2010.09633v1
Date: Mon, 19 Oct 2020 16:17:20 GMT
Title: Verifying the Causes of Adversarial Examples
Authors: Honglin Li, Yifei Fan, Frieder Ganz, Anthony Yezzi, Payam Barnaghi
Abstract summary: The robustness of neural networks is challenged by adversarial examples that contain almost imperceptible perturbations to inputs. We present a collection of potential causes of adversarial examples and verify (or partially verify) them through carefully-designed controlled experiments. Our experiment results show that geometric factors tend to be more direct causes and statistical factors magnify the phenomenon.
Score: 5.381050729919025
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The robustness of neural networks is challenged by adversarial examples that contain almost imperceptible perturbations to inputs, which mislead a classifier to incorrect outputs in high confidence. Limited by the extreme difficulty in examining a high-dimensional image space thoroughly, research on explaining and justifying the causes of adversarial examples falls behind studies on attacks and defenses. In this paper, we present a collection of potential causes of adversarial examples and verify (or partially verify) them through carefully-designed controlled experiments. The major causes of adversarial examples include model linearity, one-sum constraint, and geometry of the categories. To control the effect of those causes, multiple techniques are applied such as $L_2$ normalization, replacement of loss functions, construction of reference datasets, and novel models using multi-layer perceptron probabilistic neural networks (MLP-PNN) and density estimation (DE). Our experiment results show that geometric factors tend to be more direct causes and statistical factors magnify the phenomenon, especially for assigning high prediction confidence. We believe this paper will inspire more studies to rigorously investigate the root causes of adversarial examples, which in turn provide useful guidance on designing more robust models.

Related papers

Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference [22.65765161695905]
Existing full-reference image quality assessment (FR-IQA) methods often fail to capture the complex causal mechanisms that underlie human perceptual responses to image distortions. We propose an FR-IQA method based on abductive counterfactual inference to investigate the causal relationships between deep network features and perceptual distortions.
arXiv Detail & Related papers (2024-12-22T09:17:57Z)
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks [14.407025310553225]
Interpretability research takes counterfactual theories of causality for granted. Counterfactual theories have problems that bias our findings in specific and predictable ways. We discuss the implications of these challenges for interpretability researchers.
arXiv Detail & Related papers (2024-07-05T17:53:03Z)
A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions. The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model. This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z)
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression [32.727673706238086]
We propose a way of delving into the unexpected vulnerability in adversarially trained networks from a causal perspective. By deploying it, we estimate the causal relation of adversarial prediction under an unbiased environment. We demonstrate that the estimated causal features are highly related to the correct prediction for adversarial robustness.
arXiv Detail & Related papers (2023-03-02T08:18:22Z)
Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes. We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z)
Quantify the Causes of Causal Emergence: Critical Conditions of Uncertainty and Asymmetry in Causal Structure [0.5372002358734439]
Investigation of causal relationships based on statistical and informational theories have posed an interesting and valuable challenge to large-scale models. This paper introduces a framework for assessing numerical conditions of Causal Emergence as theoretical constraints of its occurrence.
arXiv Detail & Related papers (2022-12-03T06:35:54Z)
Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph. The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z)
A Frequency Perspective of Adversarial Robustness [72.48178241090149]
We present a frequency-based understanding of adversarial examples, supported by theoretical and empirical findings. Our analysis shows that adversarial examples are neither in high-frequency nor in low-frequency components, but are simply dataset dependent. We propose a frequency-based explanation for the commonly observed accuracy vs. robustness trade-off.
arXiv Detail & Related papers (2021-10-26T19:12:34Z)
Pruning in the Face of Adversaries [0.0]
We evaluate the impact of neural network pruning on the adversarial robustness against L-0, L-2 and L-infinity attacks. Our results confirm that neural network pruning and adversarial robustness are not mutually exclusive. We extend our analysis to situations that incorporate additional assumptions on the adversarial scenario and show that depending on the situation, different strategies are optimal.
arXiv Detail & Related papers (2021-08-19T09:06:16Z)
Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network. Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z)
ACRE: Abstract Causal REasoning Beyond Covariation [90.99059920286484]
We introduce the Abstract Causal REasoning dataset for systematic evaluation of current vision systems in causal induction. Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario. We notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning.
arXiv Detail & Related papers (2021-03-26T02:42:38Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.