Related papers: SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training

SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training

URL: http://arxiv.org/abs/2311.05143v2
Date: Fri, 10 Nov 2023 08:53:57 GMT
Title: SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training
Authors: Rui Xu, Wenkang Qin, Peixiang Huang, Hao Wang, Lin Luo
Abstract summary: Saliency map is a common form of explanation illustrating the heatmap of feature attributions. We propose a model-agnostic learning method called Saliency Constrained Adaptive Adversarial Training (SCAAT) to improve the quality of such DNN interpretability.
Score: 10.716021768803433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) are expected to provide explanation for users to understand their black-box predictions. Saliency map is a common form of explanation illustrating the heatmap of feature attributions, but it suffers from noise in distinguishing important features. In this paper, we propose a model-agnostic learning method called Saliency Constrained Adaptive Adversarial Training (SCAAT) to improve the quality of such DNN interpretability. By constructing adversarial samples under the guidance of saliency map, SCAAT effectively eliminates most noise and makes saliency maps sparser and more faithful without any modification to the model architecture. We apply SCAAT to multiple DNNs and evaluate the quality of the generated saliency maps on various natural and pathological image datasets. Evaluations on different domains and metrics show that SCAAT significantly improves the interpretability of DNNs by providing more faithful saliency maps without sacrificing their predictive power.

Related papers

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning [9.947555560412397]
We introduce TRACER, a novel method grounded in causal inference theory to estimate the causal dynamics underpinning DNN decisions. Our approach systematically intervenes on input features to observe how specific changes propagate through the network, affecting internal activations and final outputs. TRACER further enhances explainability by generating counterfactuals that reveal possible model biases and offer contrastive explanations for misclassifications.
arXiv Detail & Related papers (2024-10-07T20:44:53Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Joint Diffusion Processes as an Inductive Bias in Sheaf Neural Networks [14.224234978509026]
Sheaf Neural Networks (SNNs) naturally extend Graph Neural Networks (GNNs) We propose two novel sheaf learning approaches that provide a more intuitive understanding of the involved structure maps. In our evaluation, we show the limitations of the real-world benchmarks used so far on SNNs.
arXiv Detail & Related papers (2024-07-30T07:17:46Z)
Improving Neural Additive Models with Bayesian Principles [54.29602161803093]
Neural additive models (NAMs) enhance the transparency of deep neural networks by handling calibrated input features in separate additive sub-networks. We develop Laplace-approximated NAMs (LA-NAMs) which show improved empirical performance on datasets and challenging real-world medical tasks.
arXiv Detail & Related papers (2023-05-26T13:19:15Z)
ADVISE: ADaptive Feature Relevance and VISual Explanations for Convolutional Neural Networks [0.745554610293091]
We introduce ADVISE, a new explainability method that quantifies and leverages the relevance of each unit of the feature map to provide better visual explanations. We extensively evaluate our idea in the image classification task using AlexNet, VGG16, ResNet50, and Xception pretrained on ImageNet. Our experiments further show that ADVISE fulfils the sensitivity and implementation independence axioms while passing the sanity checks.
arXiv Detail & Related papers (2022-03-02T18:16:57Z)
Edge-Level Explanations for Graph Neural Networks by Extending Explainability Methods for Convolutional Neural Networks [33.20913249848369]
Graph Neural Networks (GNNs) are deep learning models that take graph data as inputs, and they are applied to various tasks such as traffic prediction and molecular property prediction. We extend explainability methods for CNNs, such as Local Interpretable Model-Agnostic Explanations (LIME), Gradient-Based Saliency Maps, and Gradient-Weighted Class Activation Mapping (Grad-CAM) to GNNs. The experimental results indicate that the LIME-based approach is the most efficient explainability method for multiple tasks in the real-world situation, outperforming even the state-of-the
arXiv Detail & Related papers (2021-11-01T06:27:29Z)
On the benefits of robust models in modulation recognition [53.391095789289736]
Deep Neural Networks (DNNs) using convolutional layers are state-of-the-art in many tasks in communications. In other domains, like image classification, DNNs have been shown to be vulnerable to adversarial perturbations. We propose a novel framework to test the robustness of current state-of-the-art models.
arXiv Detail & Related papers (2021-03-27T19:58:06Z)
Learning Deep Interleaved Networks with Asymmetric Co-Attention for Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction. In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies. Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z)
Explaining and Improving Model Behavior with k Nearest Neighbor Representations [107.24850861390196]
We propose using k nearest neighbor representations to identify training examples responsible for a model's predictions. We show that kNN representations are effective at uncovering learned spurious associations. Our results indicate that the kNN approach makes the finetuned model more robust to adversarial inputs.
arXiv Detail & Related papers (2020-10-18T16:55:25Z)
Improving the Interpretability of fMRI Decoding using Deep Neural Networks and Adversarial Robustness [1.254120224317171]
A saliency map is a common approach for producing interpretable visualizations of the relative importance of input features for a prediction. In this paper, we review a variety of methods for producing gradient-based saliency maps, and present a new adversarial training method we developed to make DNNs robust to input noise.
arXiv Detail & Related papers (2020-04-23T12:56:24Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.