Finding the right XAI method -- A Guide for the Evaluation and Ranking of Explainable AI Methods in Climate Science
- URL: http://arxiv.org/abs/2303.00652v2
- Date: Fri, 22 Mar 2024 17:56:05 GMT
- Title: Finding the right XAI method -- A Guide for the Evaluation and Ranking of Explainable AI Methods in Climate Science
- Authors: Philine Bommer, Marlene Kretschmer, Anna Hedström, Dilyara Bareeva, Marina M. -C. Höhne,
- Abstract summary: We introduce XAI evaluation in the climate context and discuss different desired explanation properties.
We find that XAI methods Integrated Gradients, layer-wise relevance propagation, and input times gradients exhibit considerable robustness, faithfulness, and complexity.
We find architecture-dependent performance differences regarding robustness, complexity and localization skills of different XAI methods.
- Score: 2.8877394238963214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainable artificial intelligence (XAI) methods shed light on the predictions of machine learning algorithms. Several different approaches exist and have already been applied in climate science. However, usually missing ground truth explanations complicate their evaluation and comparison, subsequently impeding the choice of the XAI method. Therefore, in this work, we introduce XAI evaluation in the climate context and discuss different desired explanation properties, namely robustness, faithfulness, randomization, complexity, and localization. To this end, we chose previous work as a case study where the decade of annual-mean temperature maps is predicted. After training both a multi-layer perceptron (MLP) and a convolutional neural network (CNN), multiple XAI methods are applied and their skill scores in reference to a random uniform explanation are calculated for each property. Independent of the network, we find that XAI methods Integrated Gradients, layer-wise relevance propagation, and input times gradients exhibit considerable robustness, faithfulness, and complexity while sacrificing randomization performance. Sensitivity methods -- gradient, SmoothGrad, NoiseGrad, and FusionGrad, match the robustness skill but sacrifice faithfulness and complexity for randomization skill. We find architecture-dependent performance differences regarding robustness, complexity and localization skills of different XAI methods, highlighting the necessity for research task-specific evaluation. Overall, our work offers an overview of different evaluation properties in the climate science context and shows how to compare and benchmark different explanation methods, assessing their suitability based on strengths and weaknesses, for the specific research problem at hand. By that, we aim to support climate researchers in the selection of a suitable XAI method.
Related papers
- Robustness of Explainable Artificial Intelligence in Industrial Process Modelling [43.388607981317016]
We evaluate current XAI methods by scoring them based on ground truth simulations and sensitivity analysis.
We show the differences between XAI methods in their ability to correctly predict the true sensitivity of the modeled industrial process.
arXiv Detail & Related papers (2024-07-12T09:46:26Z) - SIDU-TXT: An XAI Algorithm for NLP with a Holistic Assessment Approach [14.928572140620245]
'Similarity Difference and Uniqueness' (SIDU) XAI method, recognized for its superior capability in localizing entire salient regions in image-based classification is extended to textual data.
The extended method, SIDU-TXT, utilizes feature activation maps from 'black-box' models to generate heatmaps at a granular, word-based level.
We find that, in sentiment analysis task of a movie review dataset, SIDU-TXT excels in both functionally and human-grounded evaluations.
arXiv Detail & Related papers (2024-02-05T14:29:54Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - An Experimental Investigation into the Evaluation of Explainability
Methods [60.54170260771932]
This work compares 14 different metrics when applied to nine state-of-the-art XAI methods and three dummy methods (e.g., random saliency maps) used as references.
Experimental results show which of these metrics produces highly correlated results, indicating potential redundancy.
arXiv Detail & Related papers (2023-05-25T08:07:07Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Evaluating Explainable Artificial Intelligence Methods for Multi-label
Deep Learning Classification Tasks in Remote Sensing [0.0]
We develop deep learning models with state-of-the-art performance in benchmark datasets.
Ten XAI methods were employed towards understanding and interpreting models' predictions.
Occlusion, Grad-CAM and Lime were the most interpretable and reliable XAI methods.
arXiv Detail & Related papers (2021-04-03T11:13:14Z) - Neural Network Attribution Methods for Problems in Geoscience: A Novel
Synthetic Benchmark Dataset [0.05156484100374058]
We provide a framework to generate attribution benchmark datasets for regression problems in the geosciences.
We train a fully-connected network to learn the underlying function that was used for simulation.
We compare estimated attribution heatmaps from different XAI methods to the ground truth in order to identify examples where specific XAI methods perform well or poorly.
arXiv Detail & Related papers (2021-03-18T03:39:17Z) - A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Ground Truth Evaluation of Neural Network Explanations with CLEVR-XAI [12.680653816836541]
We propose a ground truth based evaluation framework for XAI methods based on the CLEVR visual question answering task.
Our framework provides a (1) selective, (2) controlled and (3) realistic testbed for the evaluation of neural network explanations.
arXiv Detail & Related papers (2020-03-16T14:43:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.