Discriminative Attribution from Counterfactuals
- URL: http://arxiv.org/abs/2109.13412v1
- Date: Tue, 28 Sep 2021 00:53:34 GMT
- Title: Discriminative Attribution from Counterfactuals
- Authors: Nils Eckstein, Alexander S. Bates, Gregory S.X.E. Jefferis, Jan Funke
- Abstract summary: We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
- Score: 64.94009515033984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a method for neural network interpretability by combining feature
attribution with counterfactual explanations to generate attribution maps that
highlight the most discriminative features between pairs of classes. We show
that this method can be used to quantitatively evaluate the performance of
feature attribution methods in an objective manner, thus preventing potential
observer bias. We evaluate the proposed method on three diverse datasets,
including a challenging artificial dataset and real-world biological data. We
show quantitatively and qualitatively that the highlighted features are
substantially more discriminative than those extracted using conventional
attribution methods and argue that this type of explanation is better suited
for understanding fine grained class differences as learned by a deep neural
network.
Related papers
- The Susceptibility of Example-Based Explainability Methods to Class Outliers [3.748789746936121]
This study explores the impact of class outliers on the effectiveness of example-based explainability methods for black-box machine learning models.
We reformulate existing explainability evaluation metrics, such as correctness and relevance, specifically for example-based methods, and introduce a new metric, distinguishability.
Using these metrics, we highlight the shortcomings of current example-based explainability methods, including those who attempt to suppress class outliers.
arXiv Detail & Related papers (2024-07-30T09:20:15Z) - Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs [0.0]
We describe an inductive approach based on learning simple belief networks.
We show how such models can be converted into (probabilistic) axioms (or rules)
arXiv Detail & Related papers (2024-07-09T17:05:52Z) - Toward Understanding the Disagreement Problem in Neural Network Feature Attribution [0.8057006406834466]
neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data.
Understanding the inner workings of these black box models remains challenging, yet crucial for high-stake decisions.
Our work addresses this confusion by investigating the explanations' fundamental and distributional behavior.
arXiv Detail & Related papers (2024-04-17T12:45:59Z) - Neural-based classification rule learning for sequential data [0.0]
We propose a novel differentiable fully interpretable method to discover both local and global patterns for rule-based binary classification.
It consists of a convolutional binary neural network with an interpretable neural filter and a training strategy based on dynamically-enforced sparsity.
We demonstrate the validity and usefulness of the approach on synthetic datasets and on an open-source peptides dataset.
arXiv Detail & Related papers (2023-02-22T11:05:05Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Concurrent Discrimination and Alignment for Self-Supervised Feature
Learning [52.213140525321165]
Existing self-supervised learning methods learn by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together.
In this work, we combine the positive aspects of the discriminating and aligning methods, and design a hybrid method that addresses the above issue.
Our method explicitly specifies the repulsion and attraction mechanism respectively by discriminative predictive task and concurrently maximizing mutual information between paired views.
Our experiments on nine established benchmarks show that the proposed model consistently outperforms the existing state-of-the-art results of self-supervised and transfer
arXiv Detail & Related papers (2021-08-19T09:07:41Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL)
It explores the characteristics of both conventional contrastive learning and deep clustering.
It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z) - Visualization of Supervised and Self-Supervised Neural Networks via
Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability.
In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z) - Deep Inverse Feature Learning: A Representation Learning of Error [6.5358895450258325]
This paper introduces a novel perspective about error in machine learning and proposes inverse feature learning (IFL) as a representation learning approach.
Inverse feature learning method operates based on a deep clustering approach to obtain a qualitative form of the representation of error as features.
The experimental results show that the proposed method leads to promising results in classification and especially in clustering.
arXiv Detail & Related papers (2020-03-09T17:45:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.