Do Feature Attribution Methods Correctly Attribute Features?
- URL: http://arxiv.org/abs/2104.14403v1
- Date: Tue, 27 Apr 2021 20:35:30 GMT
- Title: Do Feature Attribution Methods Correctly Attribute Features?
- Authors: Yilun Zhou, Serena Booth, Marco Tulio Ribeiro, Julie Shah
- Abstract summary: Feature attribution methods are exceedingly popular in interpretable machine learning.
There is no consensus on the definition of "attribution"
We evaluate three methods: saliency maps, rationales, and attention.
- Score: 5.58592454173439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature attribution methods are exceedingly popular in interpretable machine
learning. They aim to compute the attribution of each input feature to
represent its importance, but there is no consensus on the definition of
"attribution", leading to many competing methods with little systematic
evaluation. The lack of attribution ground truth further complicates
evaluation, which has to rely on proxy metrics. To address this, we propose a
dataset modification procedure such that models trained on the new dataset have
ground truth attribution available. We evaluate three methods: saliency maps,
rationales, and attention. We identify their deficiencies and add a new
perspective to the growing body of evidence questioning their correctness and
reliability in the wild. Our evaluation approach is model-agnostic and can be
used to assess future feature attribution method proposals as well. Code is
available at https://github.com/YilunZhou/feature-attribution-evaluation.
Related papers
- On the Evaluation Consistency of Attribution-based Explanations [42.1421504321572]
We introduce Meta-Rank, an open platform for benchmarking attribution methods in the image domain.
Our benchmark reveals three insights in attribution evaluation endeavors: 1) evaluating attribution methods under disparate settings can yield divergent performance rankings; 2) although inconsistent across numerous cases, the performance rankings exhibit remarkable consistency across distinct checkpoints along the same training trajectory; and 3) prior attempts at consistent evaluation fare no better than baselines when extended to more heterogeneous models and datasets.
arXiv Detail & Related papers (2024-07-28T11:49:06Z) - Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods [49.62131719441252]
Attribution methods compute importance scores for input features to explain the output predictions of deep models.
In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill.
We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
arXiv Detail & Related papers (2024-05-02T13:48:37Z) - A Comprehensive and Reliable Feature Attribution Method: Double-sided
Remove and Reconstruct (DoRaR) [3.43406114216767]
We introduce the Double-sided Remove and Reconstruct (DoRaR) feature attribution method based on several improvement methods.
We demonstrate that the DoRaR feature attribution method can effectively bypass the above issues and can aid in training a feature selector that outperforms other state-of-the-art feature attribution methods.
arXiv Detail & Related papers (2023-10-27T07:40:45Z) - A Dual-Perspective Approach to Evaluating Feature Attribution Methods [40.73602126894125]
We propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness.
Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features.
We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.
arXiv Detail & Related papers (2023-08-17T12:41:04Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - Evaluating Feature Attribution Methods in the Image Domain [7.852862161478641]
We investigate existing metrics and propose new variants of metrics for the evaluation of attribution maps.
We find that different attribution metrics seem to measure different underlying concepts of attribution maps.
We propose a general benchmarking approach to identify the ideal feature attribution method for a given use case.
arXiv Detail & Related papers (2022-02-22T15:14:33Z) - Fine-Grained Neural Network Explanation by Identifying Input Features
with Predictive Information [53.28701922632817]
We propose a method to identify features with predictive information in the input domain.
The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through.
arXiv Detail & Related papers (2021-10-04T14:13:42Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation
under Zero-Shot Pedestrian Identity Setting [48.347987541336146]
We argue that it is time to step back and analyze the status quo of pedestrian attribute recognition.
We formally define and distinguish pedestrian attribute recognition from other similar tasks.
Experiments are conducted on four existing datasets and two proposed datasets to measure progress on pedestrian attribute recognition.
arXiv Detail & Related papers (2021-07-08T03:12:24Z) - Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z) - Towards Unifying Feature Attribution and Counterfactual Explanations:
Different Means to the Same End [17.226134854746267]
We present a method to generate feature attribution explanations from a set of counterfactual examples.
We show how counterfactual examples can be used to evaluate the goodness of an attribution-based explanation in terms of its necessity and sufficiency.
arXiv Detail & Related papers (2020-11-10T05:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.