Evaluating Feature Attribution Methods in the Image Domain
- URL: http://arxiv.org/abs/2202.12270v2
- Date: Fri, 9 Aug 2024 12:48:25 GMT
- Title: Evaluating Feature Attribution Methods in the Image Domain
- Authors: Arne Gevaert, Axel-Jan Rousseau, Thijs Becker, Dirk Valkenborg, Tijl De Bie, Yvan Saeys,
- Abstract summary: We investigate existing metrics and propose new variants of metrics for the evaluation of attribution maps.
We find that different attribution metrics seem to measure different underlying concepts of attribution maps.
We propose a general benchmarking approach to identify the ideal feature attribution method for a given use case.
- Score: 7.852862161478641
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, little attention is given to the objective evaluation of such attribution maps. Building on previous work in this domain, we investigate existing metrics and propose new variants of metrics for the evaluation of attribution maps. We confirm a recent finding that different attribution metrics seem to measure different underlying concepts of attribution maps, and extend this finding to a larger selection of attribution metrics. We also find that metric results on one dataset do not necessarily generalize to other datasets, and methods with desirable theoretical properties such as DeepSHAP do not necessarily outperform computationally cheaper alternatives. Based on these findings, we propose a general benchmarking approach to identify the ideal feature attribution method for a given use case. Implementations of attribution metrics and our experiments are available online.
Related papers
- On the Evaluation Consistency of Attribution-based Explanations [42.1421504321572]
We introduce Meta-Rank, an open platform for benchmarking attribution methods in the image domain.
Our benchmark reveals three insights in attribution evaluation endeavors: 1) evaluating attribution methods under disparate settings can yield divergent performance rankings; 2) although inconsistent across numerous cases, the performance rankings exhibit remarkable consistency across distinct checkpoints along the same training trajectory; and 3) prior attempts at consistent evaluation fare no better than baselines when extended to more heterogeneous models and datasets.
arXiv Detail & Related papers (2024-07-28T11:49:06Z) - Piecewise-Linear Manifolds for Deep Metric Learning [8.670873561640903]
Unsupervised deep metric learning focuses on learning a semantic representation space using only unlabeled data.
We propose to model the high-dimensional data manifold using a piecewise-linear approximation, with each low-dimensional linear piece approximating the data manifold in a small neighborhood of a point.
We empirically show that this similarity estimate correlates better with the ground truth than the similarity estimates of current state-of-the-art techniques.
arXiv Detail & Related papers (2024-03-22T06:22:20Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - RGB-D-Based Categorical Object Pose and Shape Estimation: Methods,
Datasets, and Evaluation [5.71097144710995]
This work provides an overview of the field in terms of methods, datasets, and evaluation protocols.
We take a critical look at the predominant evaluation protocol, including metrics and datasets.
We propose a new set of metrics, contribute new annotations for the Redwood dataset, and evaluate state-of-the-art methods in a fair comparison.
arXiv Detail & Related papers (2023-01-19T15:59:10Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - On the Evaluation of RGB-D-based Categorical Pose and Shape Estimation [5.71097144710995]
In this work we take a critical look at this predominant evaluation protocol including metrics and datasets.
We propose a new set of metrics, contribute new annotations for the Redwood dataset and evaluate state-of-the-art methods in a fair comparison.
arXiv Detail & Related papers (2022-02-21T16:31:18Z) - Do Feature Attribution Methods Correctly Attribute Features? [5.58592454173439]
Feature attribution methods are exceedingly popular in interpretable machine learning.
There is no consensus on the definition of "attribution"
We evaluate three methods: saliency maps, rationales, and attention.
arXiv Detail & Related papers (2021-04-27T20:35:30Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z) - CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus [62.86856923633923]
We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements.
In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data.
For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-08T17:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.