Related papers: Developing Guidelines for Functionally-Grounded Evaluation of Explainable Artificial Intelligence using Tabular Data

Developing Guidelines for Functionally-Grounded Evaluation of Explainable Artificial Intelligence using Tabular Data

URL: http://arxiv.org/abs/2410.12803v1
Date: Mon, 30 Sep 2024 11:42:54 GMT
Title: Developing Guidelines for Functionally-Grounded Evaluation of Explainable Artificial Intelligence using Tabular Data
Authors: Mythreyi Velmurugan, Chun Ouyang, Yue Xu, Renuka Sindhgatta, Bemali Wickramanayake, Catarina Moreira,
Abstract summary: We identify 20 evaluation criteria and associated evaluation methods, and derive guidelines on when and how each criterion should be evaluated. Our study contributes to the body of knowledge on XAI evaluation through in-depth examination of functionally-grounded XAI evaluation protocols.
Score: 5.864471607396997
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Explainable Artificial Intelligence (XAI) techniques are used to provide transparency to complex, opaque predictive models. However, these techniques are often designed for image and text data, and it is unclear how fit-for-purpose they are when applied to tabular data. As XAI techniques are rarely evaluated in settings with tabular data, the applicability of existing evaluation criteria and methods are also unclear and needs (re-)examination. For example, some works suggest that evaluation methods may unduly influence the evaluation results when using tabular data. This lack of clarity on evaluation procedures can lead to reduced transparency and ineffective use of XAI techniques in real world settings. In this study, we examine literature on XAI evaluation to derive guidelines on functionally-grounded assessment of local, post hoc XAI techniques. We identify 20 evaluation criteria and associated evaluation methods, and derive guidelines on when and how each criterion should be evaluated. We also identify key research gaps to be addressed by future work. Our study contributes to the body of knowledge on XAI evaluation through in-depth examination of functionally-grounded XAI evaluation protocols, and has laid the groundwork for future research on XAI evaluation.

Related papers

Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics [10.045644410833402]
We introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics. We showcase the high risk of conflicting metrics leading to unreliable rankings and consequently propose a more robust evaluation scheme. LATEC reinforces its role in future XAI research by publicly releasing all 326k saliency maps and 378k metric scores as a (meta-evaluation) dataset.
arXiv Detail & Related papers (2024-09-25T09:07:46Z)
STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond [68.47402386668846]
We introduce Structured Reasoning In Critical Text Assessment (STRICTA) to model text assessment as an explicit, step-wise reasoning process.<n>STRICTA breaks down the assessment into a graph of interconnected reasoning steps drawing on causality theory.<n>We apply STRICTA to a dataset of over 4000 reasoning steps from roughly 40 biomedical experts on more than 20 papers.
arXiv Detail & Related papers (2024-09-09T06:55:37Z)
SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach [14.928572140620245]
'Similarity Difference and Uniqueness' (SIDU) XAI method, recognized for its superior capability in localizing entire salient regions in image-based classification is extended to textual data. The extended method, SIDU-TXT, utilizes feature activation maps from 'black-box' models to generate heatmaps at a granular, word-based level. We find that, in sentiment analysis task of a movie review dataset, SIDU-TXT excels in both functionally and human-grounded evaluations.
arXiv Detail & Related papers (2024-02-05T14:29:54Z)
How much informative is your XAI? A decision-making assessment task to objectively measure the goodness of explanations [53.01494092422942]
The number and complexity of personalised and user-centred approaches to XAI have rapidly grown in recent years. It emerged that user-centred approaches to XAI positively affect the interaction between users and systems. We propose an assessment task to objectively and quantitatively measure the goodness of XAI systems.
arXiv Detail & Related papers (2023-12-07T15:49:39Z)
From Static Benchmarks to Adaptive Testing: Psychometrics in AI Evaluation [60.14902811624433]
We discuss a paradigm shift from static evaluation methods to adaptive testing. This involves estimating the characteristics and value of each test item in the benchmark and dynamically adjusting items in real-time. We analyze the current approaches, advantages, and underlying reasons for adopting psychometrics in AI evaluation.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
An Experimental Investigation into the Evaluation of Explainability Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references. Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z)
Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI [65.44737844681256]
A lack of consensus on how to evaluate explainable AI (XAI) hinders the advancement of the field. We argue that one way to close the gap is to develop evaluation methods that account for different user requirements.
arXiv Detail & Related papers (2022-06-22T05:17:33Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
Data Representing Ground-Truth Explanations to Evaluate XAI Methods [0.0]
Explainable artificial intelligence (XAI) methods are currently evaluated with approaches mostly originated in interpretable machine learning (IML) research. We propose to represent explanations with canonical equations that can be used to evaluate the accuracy of XAI methods.
arXiv Detail & Related papers (2020-11-18T16:54:53Z)
Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.