On quantitative aspects of model interpretability
- URL: http://arxiv.org/abs/2007.07584v1
- Date: Wed, 15 Jul 2020 10:05:05 GMT
- Title: On quantitative aspects of model interpretability
- Authors: An-phi Nguyen, Mar\'ia Rodr\'iguez Mart\'inez
- Abstract summary: We argue that methods along these dimensions can be imputed to two conceptual parts, namely the extractor and the actual explainability method.
We experimentally validate our metrics on different benchmark tasks and show how they can be used to guide a practitioner in the selection of the most appropriate method for the task at hand.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the growing body of work in interpretable machine learning, it
remains unclear how to evaluate different explainability methods without
resorting to qualitative assessment and user-studies. While interpretability is
an inherently subjective matter, previous works in cognitive science and
epistemology have shown that good explanations do possess aspects that can be
objectively judged apart from fidelity), such assimplicity and broadness. In
this paper we propose a set of metrics to programmatically evaluate
interpretability methods along these dimensions. In particular, we argue that
the performance of methods along these dimensions can be orthogonally imputed
to two conceptual parts, namely the feature extractor and the actual
explainability method. We experimentally validate our metrics on different
benchmark tasks and show how they can be used to guide a practitioner in the
selection of the most appropriate method for the task at hand.
Related papers
- Toward Understanding the Disagreement Problem in Neural Network Feature Attribution [0.8057006406834466]
neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data.
Understanding the inner workings of these black box models remains challenging, yet crucial for high-stake decisions.
Our work addresses this confusion by investigating the explanations' fundamental and distributional behavior.
arXiv Detail & Related papers (2024-04-17T12:45:59Z) - Ensemble Interpretation: A Unified Method for Interpretable Machine
Learning [1.276129213205911]
A novel interpretable methodology, ensemble interpretation, is presented in this paper.
Experiment results show that the ensemble interpretation is more stable and more consistent with human experience and cognition.
As an application, we use the ensemble interpretation for feature selection, and then the generalization performance of the corresponding learning model is significantly improved.
arXiv Detail & Related papers (2023-12-11T09:51:24Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Visualizing and Understanding Contrastive Learning [22.553990823550784]
We design visual explanation methods that contribute towards understanding similarity learning tasks from pairs of images.
We also adapt existing metrics, used to evaluate visual explanations of image classification systems, to suit pairs of explanations.
arXiv Detail & Related papers (2022-06-20T13:01:46Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation
Framework for Explainability Methods [6.232071870655069]
We show that theoretical measures used to score explainability methods poorly reflect the practical usefulness of individual attribution methods in real-world scenarios.
Our results suggest a critical need to develop better explainability methods and to deploy human-centered evaluation approaches.
arXiv Detail & Related papers (2021-12-06T18:36:09Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
and Semantic Evaluation [23.72825603188359]
We can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit.
We propose a semantic-based evaluation metric that can better align with humans' judgment of explanations.
arXiv Detail & Related papers (2021-06-09T00:49:56Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques.
We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.