Post hoc Explanations may be Ineffective for Detecting Unknown Spurious
Correlation
- URL: http://arxiv.org/abs/2212.04629v1
- Date: Fri, 9 Dec 2022 02:05:39 GMT
- Title: Post hoc Explanations may be Ineffective for Detecting Unknown Spurious
Correlation
- Authors: Julius Adebayo, Michael Muelly, Hal Abelson, Been Kim
- Abstract summary: We investigate whether three types of post hoc model explanations are effective for detecting a model's reliance on spurious signals in the training data.
We design an empirical methodology that uses semi-synthetic datasets along with pre-specified spurious artifacts.
We find that the post hoc explanation methods tested are ineffective when the spurious artifact is unknown at test-time.
- Score: 12.185584875925906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate whether three types of post hoc model explanations--feature
attribution, concept activation, and training point ranking--are effective for
detecting a model's reliance on spurious signals in the training data.
Specifically, we consider the scenario where the spurious signal to be detected
is unknown, at test-time, to the user of the explanation method. We design an
empirical methodology that uses semi-synthetic datasets along with
pre-specified spurious artifacts to obtain models that verifiably rely on these
spurious training signals. We then provide a suite of metrics that assess an
explanation method's reliability for spurious signal detection under various
conditions. We find that the post hoc explanation methods tested are
ineffective when the spurious artifact is unknown at test-time especially for
non-visible artifacts like a background blur. Further, we find that feature
attribution methods are susceptible to erroneously indicating dependence on
spurious signals even when the model being explained does not rely on spurious
artifacts. This finding casts doubt on the utility of these approaches, in the
hands of a practitioner, for detecting a model's reliance on spurious signals.
Related papers
- Demystifying amortized causal discovery with transformers [21.058343547918053]
Supervised learning approaches for causal discovery from observational data often achieve competitive performance.
In this work, we investigate CSIvA, a transformer-based model promising to train on synthetic data and transfer to real data.
We bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations.
arXiv Detail & Related papers (2024-05-27T08:17:49Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - Detecting Spurious Correlations via Robust Visual Concepts in Real and
AI-Generated Image Classification [12.992095539058022]
We introduce a general-purpose method that efficiently detects potential spurious correlations.
The proposed method provides intuitive explanations while eliminating the need for pixel-level annotations.
Our method is also suitable for detecting spurious correlations that may propagate to downstream applications originating from generative models.
arXiv Detail & Related papers (2023-11-03T01:12:35Z) - Right for the Wrong Reason: Can Interpretable ML Techniques Detect
Spurious Correlations? [2.7558542803110244]
We propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations.
We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance.
arXiv Detail & Related papers (2023-07-23T14:43:17Z) - On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z) - ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z) - The Familiarity Hypothesis: Explaining the Behavior of Deep Open Set
Methods [86.39044549664189]
Anomaly detection algorithms for feature-vector data identify anomalies as outliers, but outlier detection has not worked well in deep learning.
This paper proposes the Familiarity Hypothesis that these methods succeed because they are detecting the absence of familiar learned features rather than the presence of novelty.
The paper concludes with a discussion of whether familiarity detection is an inevitable consequence of representation learning.
arXiv Detail & Related papers (2022-03-04T18:32:58Z) - On Predictive Explanation of Data Anomalies [3.1798318618973362]
PROTEUS is an AutoML pipeline designed for feature selection on imbalanced datasets.
It produces predictive explanations by approximating the decision surface of an unsupervised detector.
It reliably estimates their predictive performance in unseen data.
arXiv Detail & Related papers (2021-10-18T16:59:28Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - The Hidden Uncertainty in a Neural Networks Activations [105.4223982696279]
The distribution of a neural network's latent representations has been successfully used to detect out-of-distribution (OOD) data.
This work investigates whether this distribution correlates with a model's epistemic uncertainty, thus indicating its ability to generalise to novel inputs.
arXiv Detail & Related papers (2020-12-05T17:30:35Z) - Debugging Tests for Model Explanations [18.073554618753395]
Methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples.
We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions.
arXiv Detail & Related papers (2020-11-10T22:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.