Related papers: From Confusion to Clarity: ProtoScore - A Framework for Evaluating Prototype-Based XAI

From Confusion to Clarity: ProtoScore - A Framework for Evaluating Prototype-Based XAI

URL: http://arxiv.org/abs/2511.08361v1
Date: Wed, 12 Nov 2025 01:55:24 GMT
Title: From Confusion to Clarity: ProtoScore - A Framework for Evaluating Prototype-Based XAI
Authors: Helena Monke, Benjamin Sae-Chew, Benjamin Fresz, Marco F. Huber,
Abstract summary: We establish a robust framework, ProtoScore, for assessing prototype-based XAI methods across different data types.<n>By integrating the Co-12 properties of Nauta et al., this framework allows for effectively comparing prototype methods against each other and against other XAI methods.
Score: 2.9583904454291168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The complexity and opacity of neural networks (NNs) pose significant challenges, particularly in high-stakes fields such as healthcare, finance, and law, where understanding decision-making processes is crucial. To address these issues, the field of explainable artificial intelligence (XAI) has developed various methods aimed at clarifying AI decision-making, thereby facilitating appropriate trust and validating the fairness of outcomes. Among these methods, prototype-based explanations offer a promising approach that uses representative examples to elucidate model behavior. However, a critical gap exists regarding standardized benchmarks to objectively compare prototype-based XAI methods, especially in the context of time series data. This lack of reliable benchmarks results in subjective evaluations, hindering progress in the field. We aim to establish a robust framework, ProtoScore, for assessing prototype-based XAI methods across different data types with a focus on time series data, facilitating fair and comprehensive evaluations. By integrating the Co-12 properties of Nauta et al., this framework allows for effectively comparing prototype methods against each other and against other XAI methods, ultimately assisting practitioners in selecting appropriate explanation methods while minimizing the costs associated with user studies. All code is publicly available at https://github.com/HelenaM23/ProtoScore .

Related papers

EVA-MILP: Towards Standardized Evaluation of MILP Instance Generation [13.49043811341421]
Mixed-Integer Linear Programming (MILP) is fundamental to solving complex decision-making problems.<n>The proliferation of MILP instance generation methods, driven by machine learning's demand for diverse datasets, has significantly outpaced standardized evaluation techniques.<n>This paper introduces a comprehensive benchmark framework designed for the systematic and objective evaluation of MILP instance generation methods.
arXiv Detail & Related papers (2025-05-30T16:42:15Z)
ODExAI: A Comprehensive Object Detection Explainable AI Evaluation [1.338174941551702]
We introduce the Object Detection Explainable AI Evaluation (ODExAI) to assess XAI methods in object detection.<n>We benchmark a set of XAI methods across two widely used object detectors and standard datasets.
arXiv Detail & Related papers (2025-04-27T14:16:14Z)
SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection.<n>Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains.<n>We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z)
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding [76.67608003501479]
We introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, and a visualization toolkit.<n>We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators.<n>The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z)
EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods [1.6383837447674294]
This paper brings together various benchmark datasets and novel performance metrics in an initial benchmarking platform. Our datasets incorporate ground truth explanations for class-conditional features. This platform assesses the performance of post-hoc XAI methods in the quality of the explanations they produce.
arXiv Detail & Related papers (2024-05-20T14:16:06Z)
Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations [13.60538902487872]
We present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets.
arXiv Detail & Related papers (2023-11-28T10:53:26Z)
Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation [79.22678026708134]
In this paper, we propose an inherently interpretable method, named Transferable Prototype Learning ( TCPL) To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.
arXiv Detail & Related papers (2023-10-12T06:36:41Z)
Precise Benchmarking of Explainable AI Attribution Methods [0.0]
We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods. Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations. Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods.
arXiv Detail & Related papers (2023-08-06T17:03:32Z)
The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus [10.135749005469686]
One of the unsolved challenges in the field of Explainable AI (XAI) is determining how to most reliably estimate the quality of an explanation method. We address this issue through a meta-evaluation of different quality estimators in XAI. Our novel framework, MetaQuantus, analyses two complementary performance characteristics of a quality estimator.
arXiv Detail & Related papers (2023-02-14T18:59:02Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
A Simple Information-Based Approach to Unsupervised Domain-Adaptive Aspect-Based Sentiment Analysis [58.124424775536326]
We propose a simple but effective technique based on mutual information to extract their term. Experiment results show that our proposed method outperforms the state-of-the-art methods for cross-domain ABSA by 4.32% Micro-F1.
arXiv Detail & Related papers (2022-01-29T10:18:07Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.