Explainer Divergence Scores (EDS): Some Post-Hoc Explanations May be
Effective for Detecting Unknown Spurious Correlations
- URL: http://arxiv.org/abs/2211.07650v1
- Date: Mon, 14 Nov 2022 15:52:21 GMT
- Title: Explainer Divergence Scores (EDS): Some Post-Hoc Explanations May be
Effective for Detecting Unknown Spurious Correlations
- Authors: Shea Cardozo, Gabriel Islas Montero, Dmitry Kazhdan, Botty Dimanov,
Maleakhi Wijaya, Mateja Jamnik and Pietro Lio
- Abstract summary: Post-hoc explainers might be ineffective for detecting spurious correlations in Deep Neural Networks (DNNs)
We show there are serious weaknesses with the existing evaluation frameworks for this setting.
We propose a new evaluation methodology, Explainer Divergence Scores (EDS), grounded in an information theory approach to evaluate explainers.
- Score: 4.223964614888875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has suggested post-hoc explainers might be ineffective for
detecting spurious correlations in Deep Neural Networks (DNNs). However, we
show there are serious weaknesses with the existing evaluation frameworks for
this setting. Previously proposed metrics are extremely difficult to interpret
and are not directly comparable between explainer methods. To alleviate these
constraints, we propose a new evaluation methodology, Explainer Divergence
Scores (EDS), grounded in an information theory approach to evaluate
explainers. EDS is easy to interpret and naturally comparable across
explainers. We use our methodology to compare the detection performance of
three different explainers - feature attribution methods, influential examples
and concept extraction, on two different image datasets. We discover post-hoc
explainers often contain substantial information about a DNN's dependence on
spurious artifacts, but in ways often imperceptible to human users. This
suggests the need for new techniques that can use this information to better
detect a DNN's reliance on spurious correlations.
Related papers
- Rethinking Distance Metrics for Counterfactual Explainability [53.436414009687]
We investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution.
We derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings.
arXiv Detail & Related papers (2024-10-18T15:06:50Z) - Sparse Explanations of Neural Networks Using Pruned Layer-Wise Relevance Propagation [1.593690982728631]
We present a modification of the widely used explanation method layer-wise relevance propagation.
Our approach enforces sparsity directly by pruning the relevance propagation for the different layers.
We show that our modification indeed leads to noise reduction and concentrates relevance on the most important features compared to the baseline.
arXiv Detail & Related papers (2024-04-22T15:16:59Z) - Diversified Outlier Exposure for Out-of-Distribution Detection via
Informative Extrapolation [110.34982764201689]
Out-of-distribution (OOD) detection is important for deploying reliable machine learning models on real-world applications.
Recent advances in outlier exposure have shown promising results on OOD detection via fine-tuning model with informatively sampled auxiliary outliers.
We propose a novel framework, namely, Diversified Outlier Exposure (DivOE), for effective OOD detection via informative extrapolation based on the given auxiliary outliers.
arXiv Detail & Related papers (2023-10-21T07:16:09Z) - Right for the Wrong Reason: Can Interpretable ML Techniques Detect
Spurious Correlations? [2.7558542803110244]
We propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations.
We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance.
arXiv Detail & Related papers (2023-07-23T14:43:17Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Effective Explanations for Entity Resolution Models [21.518135952436975]
We study the fundamental problem of explainability of the deep learning solution for ER.
We propose the CERTA approach that is aware of the semantics of the ER problem.
We experimentally evaluate CERTA's explanations of state-of-the-art ER solutions based on DL models using publicly available datasets.
arXiv Detail & Related papers (2022-03-24T10:50:05Z) - Benchmarking Deep Models for Salient Object Detection [67.07247772280212]
We construct a general SALient Object Detection (SALOD) benchmark to conduct a comprehensive comparison among several representative SOD methods.
In the above experiments, we find that existing loss functions usually specialized in some metrics but reported inferior results on the others.
We propose a novel Edge-Aware (EA) loss that promotes deep networks to learn more discriminative features by integrating both pixel- and image-level supervision signals.
arXiv Detail & Related papers (2022-02-07T03:43:16Z) - Coalitional Bayesian Autoencoders -- Towards explainable unsupervised
deep learning [78.60415450507706]
We show that explanations of BAE's predictions suffer from high correlation resulting in misleading explanations.
To alleviate this, a "Coalitional BAE" is proposed, which is inspired by agent-based system theory.
Our experiments on publicly available condition monitoring datasets demonstrate the improved quality of explanations using the Coalitional BAE.
arXiv Detail & Related papers (2021-10-19T15:07:09Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Unsupervised Detection of Adversarial Examples with Model Explanations [0.6091702876917279]
We propose a simple yet effective method to detect adversarial examples using methods developed to explain the model's behavior.
Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples with high confidence.
arXiv Detail & Related papers (2021-07-22T06:54:18Z) - Explainable Recommendation via Interpretable Feature Mapping and
Evaluation of Explainability [22.58823484394866]
Experimental results demonstrate a strong performance in both recommendation and explaining explanation, eliminating the need for metadata.
We present a novel feature mapping approach that maps the uninterpretable general features onto the interpretable aspect features.
arXiv Detail & Related papers (2020-07-12T23:49:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.