Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI
Evaluation Methods into an Interactive and Multi-dimensional Benchmark
- URL: http://arxiv.org/abs/2207.14160v2
- Date: Tue, 4 Oct 2022 10:45:23 GMT
- Title: Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI
Evaluation Methods into an Interactive and Multi-dimensional Benchmark
- Authors: Mohamed Karim Belaid, Eyke H\"ullermeier, Maximilian Rabus, Ralf
Krestel
- Abstract summary: We propose Compare-xAI, a benchmark that unifies all exclusive functional testing methods applied to xAI algorithms.
The benchmark encapsulates the complexity of evaluating xAI methods into a hierarchical scoring of three levels.
The interactive user interface helps mitigate errors in interpreting xAI results.
- Score: 6.511859672210113
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, Explainable AI (xAI) attracted a lot of attention as various
countries turned explanations into a legal right. xAI allows for improving
models beyond the accuracy metric by, e.g., debugging the learned pattern and
demystifying the AI's behavior. The widespread use of xAI brought new
challenges. On the one hand, the number of published xAI algorithms underwent a
boom, and it became difficult for practitioners to select the right tool. On
the other hand, some experiments did highlight how easy data scientists could
misuse xAI algorithms and misinterpret their results. To tackle the issue of
comparing and correctly using feature importance xAI algorithms, we propose
Compare-xAI, a benchmark that unifies all exclusive functional testing methods
applied to xAI algorithms. We propose a selection protocol to shortlist
non-redundant functional tests from the literature, i.e., each targeting a
specific end-user requirement in explaining a model. The benchmark encapsulates
the complexity of evaluating xAI methods into a hierarchical scoring of three
levels, namely, targeting three end-user groups: researchers, practitioners,
and laymen in xAI. The most detailed level provides one score per test. The
second level regroups tests into five categories (fidelity, fragility,
stability, simplicity, and stress tests). The last level is the aggregated
comprehensibility score, which encapsulates the ease of correctly interpreting
the algorithm's output in one easy to compare value. Compare-xAI's interactive
user interface helps mitigate errors in interpreting xAI results by quickly
listing the recommended xAI solutions for each ML task and their current
limitations. The benchmark is made available at
https://karim-53.github.io/cxai/
Related papers
- Precise Benchmarking of Explainable AI Attribution Methods [0.0]
We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods.
Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations.
Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods.
arXiv Detail & Related papers (2023-08-06T17:03:32Z) - Strategies to exploit XAI to improve classification systems [0.0]
XAI aims to provide insights into the decision-making process of AI models, allowing users to understand their results beyond their decisions.
Most XAI literature focuses on how to explain an AI system, while less attention has been given to how XAI methods can be exploited to improve an AI system.
arXiv Detail & Related papers (2023-06-09T10:38:26Z) - An Experimental Investigation into the Evaluation of Explainability
Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references.
Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Understanding User Preferences in Explainable Artificial Intelligence: A Survey and a Mapping Function Proposal [0.0]
This study conducts a thorough review of extant research in Explainable Machine Learning (XML)
Our main objective is to offer a classification of XAI methods within the realm of XML.
We propose a mapping function that take to account users and their desired properties and suggest an XAI method to them.
arXiv Detail & Related papers (2023-02-07T01:06:38Z) - Optimizing Explanations by Network Canonization and Hyperparameter
Search [74.76732413972005]
Rule-based and modified backpropagation XAI approaches often face challenges when being applied to modern model architectures.
Model canonization is the process of re-structuring the model to disregard problematic components without changing the underlying function.
In this work, we propose canonizations for currently relevant model blocks applicable to popular deep neural network architectures.
arXiv Detail & Related papers (2022-11-30T17:17:55Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Responsibility: An Example-based Explainable AI approach via Training
Process Inspection [1.4610038284393165]
We present a novel XAI approach that identifies the most responsible training example for a particular decision.
This example can then be shown as an explanation: "this is what I (the AI) learned that led me to do that"
Our results demonstrate that responsibility can help improve accuracy for both human end users and secondary ML models.
arXiv Detail & Related papers (2022-09-07T19:30:01Z) - Explaining Any ML Model? -- On Goals and Capabilities of XAI [2.236663830879273]
We argue that the goals and capabilities of XAI algorithms are far from being well understood.
We show that users can ask diverse questions, but that only one of them can be answered by current XAI algorithms.
arXiv Detail & Related papers (2022-06-28T11:09:33Z) - Connecting Algorithmic Research and Usage Contexts: A Perspective of
Contextualized Evaluation for Explainable AI [65.44737844681256]
A lack of consensus on how to evaluate explainable AI (XAI) hinders the advancement of the field.
We argue that one way to close the gap is to develop evaluation methods that account for different user requirements.
arXiv Detail & Related papers (2022-06-22T05:17:33Z) - A User-Centred Framework for Explainable Artificial Intelligence in
Human-Robot Interaction [70.11080854486953]
We propose a user-centred framework for XAI that focuses on its social-interactive aspect.
The framework aims to provide a structure for interactive XAI solutions thought for non-expert users.
arXiv Detail & Related papers (2021-09-27T09:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.