Improving Network Interpretability via Explanation Consistency Evaluation
- URL: http://arxiv.org/abs/2408.04600v1
- Date: Thu, 8 Aug 2024 17:20:08 GMT
- Title: Improving Network Interpretability via Explanation Consistency Evaluation
- Authors: Hefeng Wu, Hao Jiang, Keze Wang, Ziyi Tang, Xianghuan He, Liang Lin,
- Abstract summary: We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
- Score: 56.14036428778861
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet effective framework that acquires more explainable activation heatmaps and simultaneously increase the model performance, without the need for any extra supervision. Specifically, our concise framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. The explanation consistency metric is utilized to measure the similarity between the model's visual explanations of the original samples and those of semantic-preserved adversarial samples, whose background regions are perturbed by using image adversarial attack techniques. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations (i.e., low explanation consistency), for which the current model cannot provide robust interpretations. Comprehensive experimental results on various benchmarks demonstrate the superiority of our framework in multiple aspects, including higher recognition accuracy, greater data debiasing capability, stronger network robustness, and more precise localization ability on both regular networks and interpretable networks. We also provide extensive ablation studies and qualitative analyses to unveil the detailed contribution of each component.
Related papers
- Outliers with Opposing Signals Have an Outsized Effect on Neural Network
Optimization [36.72245290832128]
We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a heavytailed structure in natural data.
In particular, it implies a conceptually new cause for progressive sharpening and the edge of stability.
We demonstrate the significant influence of paired groups of outliers in the training data with strong opposing signals.
arXiv Detail & Related papers (2023-11-07T17:43:50Z) - Probabilistic Self-supervised Learning via Scoring Rules Minimization [19.347097627898876]
We propose a novel probabilistic self-supervised learning via Scoring Rule Minimization (ProSMIN) to enhance representation quality and mitigate collapsing representations.
Our method achieves superior accuracy and calibration, surpassing the self-supervised baseline in a wide range of experiments on large-scale datasets.
arXiv Detail & Related papers (2023-09-05T08:48:25Z) - Understanding Robust Learning through the Lens of Representation
Similarities [37.66877172364004]
robustness to adversarial examples has emerged as a desirable property for deep neural networks (DNNs)
In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training.
arXiv Detail & Related papers (2022-06-20T16:06:20Z) - Explaining, Evaluating and Enhancing Neural Networks' Learned
Representations [2.1485350418225244]
We show how explainability can be an aid, rather than an obstacle, towards better and more efficient representations.
We employ such attributions to define two novel scores for evaluating the informativeness and the disentanglement of latent embeddings.
We show that adopting our proposed scores as constraints during the training of a representation learning task improves the downstream performance of the model.
arXiv Detail & Related papers (2022-02-18T19:00:01Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Adversarial Training Reduces Information and Improves Transferability [81.59364510580738]
Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility.
We show that the Adversarial Training can improve linear transferability to new tasks, from which arises a new trade-off between transferability of representations and accuracy on the source task.
arXiv Detail & Related papers (2020-07-22T08:30:16Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.