A Comparative Study of Faithfulness Metrics for Model Interpretability
Methods
- URL: http://arxiv.org/abs/2204.05514v1
- Date: Tue, 12 Apr 2022 04:02:17 GMT
- Title: A Comparative Study of Faithfulness Metrics for Model Interpretability
Methods
- Authors: Chun Sik Chan, Huanqi Kong, Guanqing Liang
- Abstract summary: We introduce two assessment dimensions, namely diagnosticity and time complexity.
According to the experimental results, we find that sufficiency and comprehensiveness metrics have higher diagnosticity and lower time complexity than the other faithfulness metric.
- Score: 3.7200349581269996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpretation methods to reveal the internal reasoning processes behind
machine learning models have attracted increasing attention in recent years. To
quantify the extent to which the identified interpretations truly reflect the
intrinsic decision-making mechanisms, various faithfulness evaluation metrics
have been proposed. However, we find that different faithfulness metrics show
conflicting preferences when comparing different interpretations. Motivated by
this observation, we aim to conduct a comprehensive and comparative study of
the widely adopted faithfulness metrics. In particular, we introduce two
assessment dimensions, namely diagnosticity and time complexity. Diagnosticity
refers to the degree to which the faithfulness metric favours relatively
faithful interpretations over randomly generated ones, and time complexity is
measured by the average number of model forward passes. According to the
experimental results, we find that sufficiency and comprehensiveness metrics
have higher diagnosticity and lower time complexity than the other faithfulness
metric
Related papers
- Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models.
We decompose the uncertainty of diagnostic parameters into data aspect and model aspect.
Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z) - Trade-off Between Dependence and Complexity for Nonparametric Learning
-- an Empirical Process Approach [10.27974860479791]
In many applications where the data exhibit temporal dependencies, the corresponding empirical processes are much less understood.
We present a general bound on the expected supremum of empirical processes under standard $beta/rho$-mixing assumptions.
We show that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting.
arXiv Detail & Related papers (2024-01-17T05:08:37Z) - Valid causal inference with unobserved confounding in high-dimensional
settings [0.0]
We show how valid semiparametric inference can be obtained in the presence of unobserved confounders and high-dimensional nuisance models.
We propose uncertainty intervals which allow for unobserved confounding, and show that the resulting inference is valid when the amount of unobserved confounding is small.
arXiv Detail & Related papers (2024-01-12T13:21:20Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - Evaluating generative audio systems and their metrics [80.97828572629093]
This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
arXiv Detail & Related papers (2022-08-31T21:48:34Z) - Hierarchical Decision Ensembles- An inferential framework for uncertain
Human-AI collaboration in forensic examinations [0.8122270502556371]
We present an inferential framework for assessing the model and its output.
The framework is designed to calibrate trust in forensic experts by bridging the gap between domain specific knowledge and predictive model results.
arXiv Detail & Related papers (2021-10-31T08:07:43Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Exploiting Uncertainties from Ensemble Learners to Improve
Decision-Making in Healthcare AI [13.890527275215284]
Ensemble learning is widely applied in Machine Learning (ML) to improve model performance and to mitigate decision risks.
We show that ensemble mean is preferable with respect to ensemble variance as an uncertainty metric for decision making.
arXiv Detail & Related papers (2020-07-12T18:33:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.