How explainable are adversarially-robust CNNs?
- URL: http://arxiv.org/abs/2205.13042v2
- Date: Sat, 3 Jun 2023 23:01:34 GMT
- Title: How explainable are adversarially-robust CNNs?
- Authors: Mehdi Nourelahi, Lars Kotthoff, Peijie Chen, Anh Nguyen
- Abstract summary: Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability.
Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs.
- Score: 7.143109213647008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Three important criteria of existing convolutional neural networks (CNNs) are
(1) test-set accuracy; (2) out-of-distribution accuracy; and (3)
explainability. While these criteria have been studied independently, their
relationship is unknown. For example, do CNNs that have a stronger
out-of-distribution performance have also stronger explainability? Furthermore,
most prior feature-importance studies only evaluate methods on 2-3 common
vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize
to CNNs of other architectures and training algorithms. Here, we perform the
first, large-scale evaluation of the relations of the three criteria using 9
feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training
algorithms and 5 CNN architectures. We find several important insights and
recommendations for ML practitioners. First, adversarially robust CNNs have a
higher explainability score on gradient-based attribution methods (but not
CAM-based or perturbation-based methods). Second, AdvProp models, despite being
highly accurate more than both vanilla and robust models alone, are not
superior in explainability. Third, among 9 feature attribution methods tested,
GradCAM and RISE are consistently the best methods. Fourth, Insertion and
Deletion are biased towards vanilla and robust models respectively, due to
their strong correlation with the confidence score distributions of a CNN.
Fifth, we did not find a single CNN to be the best in all three criteria, which
interestingly suggests that CNNs are harder to interpret as they become more
accurate.
Related papers
- Robust Mixture-of-Expert Training for Convolutional Neural Networks [141.3531209949845]
Sparsely-gated Mixture of Expert (MoE) has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference.
We propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE.
We find that AdvMoE achieves 1% 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE.
arXiv Detail & Related papers (2023-08-19T20:58:21Z) - Understanding CNN Fragility When Learning With Imbalanced Data [1.1444576186559485]
Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still have difficulty generalizing to minority classes.
We focus on their latent features to demystify CNN decisions on imbalanced data.
We show that important information regarding the ability of a neural network to generalize to minority classes resides in the class top-K CE and FE.
arXiv Detail & Related papers (2022-10-17T22:40:06Z) - Patching Weak Convolutional Neural Network Models through Modularization
and Composition [19.986199290508925]
A convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily.
We propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules.
We show that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
arXiv Detail & Related papers (2022-09-11T15:26:16Z) - Improving the Accuracy and Robustness of CNNs Using a Deep CCA Neural
Data Regularizer [2.026424957803652]
As convolutional neural networks (CNNs) become more accurate at object recognition, their representations become more similar to the primate visual system.
Previous attempts to address this question showed very modest gains in accuracy, owing in part to limitations of the regularization method.
We develop a new neural data regularizer for CNNs that uses Deep Correlation Analysis (DCCA) to optimize the resemblance of the CNN's image representations to that of the monkey visual cortex.
arXiv Detail & Related papers (2022-09-06T15:40:39Z) - Exploring Adversarial Examples and Adversarial Robustness of
Convolutional Neural Networks by Mutual Information [44.841339443764696]
This work investigates similarities and differences between two types of convolutional neural networks (CNNs) in information extraction.
The reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other categories.
Normally trained CNNs tend to extract texture-based information from the inputs, while adversarially trained models prefer to shape-based information.
arXiv Detail & Related papers (2022-07-12T13:25:42Z) - Neural Architecture Dilation for Adversarial Robustness [56.18555072877193]
A shortcoming of convolutional neural networks is that they are vulnerable to adversarial attacks.
This paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy.
Under a minimal computational overhead, a dilation architecture is expected to be friendly with the standard performance of the backbone CNN.
arXiv Detail & Related papers (2021-08-16T03:58:00Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - CIFS: Improving Adversarial Robustness of CNNs via Channel-wise
Importance-based Feature Selection [186.34889055196925]
We investigate the adversarial robustness of CNNs from the perspective of channel-wise activations.
We observe that adversarial training (AT) robustifies CNNs by aligning the channel-wise activations of adversarial data with those of their natural counterparts.
We introduce a novel mechanism, i.e., underlineChannel-wise underlineImportance-based underlineFeature underlineSelection (CIFS)
arXiv Detail & Related papers (2021-02-10T08:16:43Z) - The shape and simplicity biases of adversarially robust ImageNet-trained
CNNs [9.707679445925516]
We study the shape bias and internal mechanisms that enable the generalizability of AlexNet, GoogLeNet, and ResNet-50 models trained via adversarial training.
Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of "robustifying" CNNs.
arXiv Detail & Related papers (2020-06-16T16:38:16Z) - A Systematic Evaluation: Fine-Grained CNN vs. Traditional CNN
Classifiers [54.996358399108566]
We investigate the performance of the landmark general CNN classifiers, which presented top-notch results on large scale classification datasets.
We compare it against state-of-the-art fine-grained classifiers.
We show an extensive evaluation on six datasets to determine whether the fine-grained classifier is able to elevate the baseline in their experiments.
arXiv Detail & Related papers (2020-03-24T23:49:14Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.