Order in the Court: Explainable AI Methods Prone to Disagreement
- URL: http://arxiv.org/abs/2105.03287v1
- Date: Fri, 7 May 2021 14:27:37 GMT
- Title: Order in the Court: Explainable AI Methods Prone to Disagreement
- Authors: Michael Neely, Stefan F. Schouten, Maurits J. R. Bleeker, and Ana
Lucic
- Abstract summary: In Natural Language Processing, feature-additive explanation methods quantify the independent contribution of each input token towards a model's decision.
Previous analyses have sought to either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience.
We show that rank correlation is largely uninformative and does not measure the quality of feature-additive methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Natural Language Processing, feature-additive explanation methods quantify
the independent contribution of each input token towards a model's decision. By
computing the rank correlation between attention weights and the scores
produced by a small sample of these methods, previous analyses have sought to
either invalidate or support the role of attention-based explanations as a
faithful and plausible measure of salience. To investigate what measures of
rank correlation can reliably conclude, we comprehensively compare
feature-additive methods, including attention-based explanations, across
several neural architectures and tasks. In most cases, we find that none of our
chosen methods agree. Therefore, we argue that rank correlation is largely
uninformative and does not measure the quality of feature-additive methods.
Additionally, the range of conclusions a practitioner may draw from a single
explainability algorithm are limited.
Related papers
- Rethinking Distance Metrics for Counterfactual Explainability [53.436414009687]
We investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution.
We derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings.
arXiv Detail & Related papers (2024-10-18T15:06:50Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - An AI Architecture with the Capability to Explain Recognition Results [0.0]
This research focuses on the importance of metrics to explainability and contributes two methods yielding performance gains.
The first method introduces a combination of explainable and unexplainable flows, proposing a metric to characterize explainability of a decision.
The second method compares classic metrics for estimating the effectiveness of neural networks in the system, posing a new metric as the leading performer.
arXiv Detail & Related papers (2024-06-13T02:00:13Z) - Predictive Coding beyond Correlations [59.47245250412873]
We show how one of such algorithms, called predictive coding, is able to perform causal inference tasks.
First, we show how a simple change in the inference process of predictive coding enables to compute interventions without the need to mutilate or redefine a causal graph.
arXiv Detail & Related papers (2023-06-27T13:57:16Z) - Comparing Explanation Methods for Traditional Machine Learning Models
Part 2: Quantifying Model Explainability Faithfulness and Improvements with
Dimensionality Reduction [0.0]
"faithfulness" or "fidelity" refer to the correspondence between the assigned feature importance and the contribution of the feature to model performance.
This study is one of the first to quantify the improvement in explainability from limiting correlated features and knowing the relative fidelity of different explainability methods.
arXiv Detail & Related papers (2022-11-18T17:15:59Z) - Quantifying Feature Contributions to Overall Disparity Using Information
Theory [24.61791450920249]
When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists.
We ask the question: what is the "potential" contribution of each individual feature to the observed disparity in the decisions when the exact decision-making mechanism is not accessible?
When unable to intervene on the inputs, we quantify the "redundant" statistical dependency about the protected attribute that is present in both the final decision and an individual feature.
arXiv Detail & Related papers (2022-06-16T21:27:22Z) - A Song of (Dis)agreement: Evaluating the Evaluation of Explainable
Artificial Intelligence in Natural Language Processing [7.527234046228323]
We argue that the community should stop using rank correlation as an evaluation metric for attention-based explanations.
We find that attention-based explanations do not correlate strongly with any recent feature attribution methods.
arXiv Detail & Related papers (2022-05-09T21:07:39Z) - Discriminative Attribution from Counterfactuals [64.94009515033984]
We present a method for neural network interpretability by combining feature attribution with counterfactual explanations.
We show that this method can be used to quantitatively evaluate the performance of feature attribution methods in an objective manner.
arXiv Detail & Related papers (2021-09-28T00:53:34Z) - Explaining Algorithmic Fairness Through Fairness-Aware Causal Path
Decomposition [37.823248189626014]
We propose to study the problem of identification of the source of model disparities.
Unlike existing interpretation methods which typically learn feature importance, we consider the causal relationships among feature variables.
Our framework is also model agnostic and applicable to a variety of quantitative disparity measures.
arXiv Detail & Related papers (2021-08-11T17:23:47Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Learning explanations that are hard to vary [75.30552491694066]
We show that averaging across examples can favor memorization and patchwork' solutions that sew together different strategies.
We then propose and experimentally validate a simple alternative algorithm based on a logical AND.
arXiv Detail & Related papers (2020-09-01T10:17:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.