Related papers: Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods

Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods

URL: http://arxiv.org/abs/2405.02344v1
Date: Thu, 2 May 2024 13:48:37 GMT
Title: Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods
Authors: Peiyu Yang, Naveed Akhtar, Jiantong Jiang, Ajmal Mian,
Abstract summary: Attribution methods compute importance scores for input features to explain the output predictions of deep models. In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill. We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
Score: 49.62131719441252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attribution methods compute importance scores for input features to explain the output predictions of deep models. However, accurate assessment of attribution methods is challenged by the lack of benchmark fidelity for attributing model predictions. Moreover, other confounding factors in attribution estimation, including the setup choices of post-processing techniques and explained model predictions, further compromise the reliability of the evaluation. In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill, thereby facilitating a systematic assessment of attribution benchmarks. Next, we introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria. We theoretically establish the superiority of our approach over the existing benchmarks for well-founded attribution evaluation. With extensive analysis, we also identify a setup for a consistent and fair benchmarking of attribution methods across different underlying methodologies. This setup is ultimately employed for a comprehensive comparison of existing methods using our BackX benchmark. Finally, our analysis also provides guidance for defending against backdoor attacks with the help of attribution methods.

Related papers

Are Bias Evaluation Methods Biased ? [3.9748528039819977]
The creation of benchmarks to evaluate the safety of Large Language Models is one of the key activities within the trusted AI community.<n>We investigate how robust such benchmarks are by using different approaches to rank a set of representative models for bias and compare how similar are the overall rankings.
arXiv Detail & Related papers (2025-06-20T16:11:25Z)
From Rankings to Insights: Evaluation Should Shift Focus from Leaderboard to Feedback [36.68929551237421]
We introduce bftextFeedbacker, an evaluation framework that provides comprehensive and fine-grained results.<n>Our project homepage and dataset are available at https://liudan193.io/Feedbacker.
arXiv Detail & Related papers (2025-05-10T16:52:40Z)
Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges [13.526258635654882]
This study introduces a Bayesian approach for large language models (LLMs) capability assessment. We treat model capabilities as latent variables and leverage a curated query set to induce discriminative responses. Experimental evaluations with GPT-series models demonstrate that the proposed method achieves superior discrimination compared to conventional evaluation methods.
arXiv Detail & Related papers (2025-04-30T04:24:50Z)
Where is this coming from? Making groundedness count in the evaluation of Document VQA models [12.951716701565019]
We argue that common evaluation metrics do not account for the semantic and multimodal groundedness of a model's outputs. We propose a new evaluation methodology that accounts for the groundedness of predictions. Our proposed methodology is parameterized in such a way that users can configure the score according to their preferences.
arXiv Detail & Related papers (2025-03-24T20:14:46Z)
Rethinking Robustness in Machine Learning: A Posterior Agreement Approach [45.284633306624634]
Posterior Agreement (PA) theory of model validation provides a principled framework for robustness evaluation. We show that the PA metric provides a sensible and consistent analysis of the vulnerabilities in learning algorithms, even in the presence of few observations.
arXiv Detail & Related papers (2025-03-20T16:03:39Z)
BEExAI: Benchmark to Evaluate Explainable AI [0.9176056742068812]
We propose BEExAI, a benchmark tool that allows large-scale comparison of different post-hoc XAI methods. We argue that the need for a reliable way of measuring the quality and correctness of explanations is becoming critical.
arXiv Detail & Related papers (2024-07-29T11:21:17Z)
On the Evaluation Consistency of Attribution-based Explanations [42.1421504321572]
We introduce Meta-Rank, an open platform for benchmarking attribution methods in the image domain. Our benchmark reveals three insights in attribution evaluation endeavors: 1) evaluating attribution methods under disparate settings can yield divergent performance rankings; 2) although inconsistent across numerous cases, the performance rankings exhibit remarkable consistency across distinct checkpoints along the same training trajectory; and 3) prior attempts at consistent evaluation fare no better than baselines when extended to more heterogeneous models and datasets.
arXiv Detail & Related papers (2024-07-28T11:49:06Z)
Trustworthy Classification through Rank-Based Conformal Prediction Sets [9.559062601251464]
We propose a novel conformal prediction method that employs a rank-based score function suitable for classification models. Our approach constructs prediction sets that achieve the desired coverage rate while managing their size. Our contributions include a novel conformal prediction method, theoretical analysis, and empirical evaluation.
arXiv Detail & Related papers (2024-07-05T10:43:41Z)
Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees [47.22930583160043]
We propose a method for building adaptive cross-conformal prediction intervals. The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets. The potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.
arXiv Detail & Related papers (2024-01-15T14:45:18Z)
A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL) The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
arXiv Detail & Related papers (2023-09-15T17:37:09Z)
TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models [9.048102020202817]
Topological Precision and Recall (TopP&R) provides a systematic approach to estimating supports. We show that TopP&R is robust to outliers and non-independent and identically distributed (Non-IID) perturbations. This is the first evaluation metric focused on the robust estimation of the support and provides its statistical consistency under noise.
arXiv Detail & Related papers (2023-06-13T11:46:00Z)
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation [53.83642844626703]
We provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates.
arXiv Detail & Related papers (2021-06-24T15:58:01Z)
REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation Metrics for Open-domain Dialog Generation [63.46331073232526]
We present an enhancement approach to Reference-based EvAluation Metrics for open-domain dialogue systems. A prediction model is designed to estimate the reliability of the given reference set. We show how its predicted results can be helpful to augment the reference set, and thus improve the reliability of the metric.
arXiv Detail & Related papers (2021-05-30T10:04:13Z)
A Unified Taylor Framework for Revisiting Attribution Methods [49.03783992773811]
We propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework. We establish three principles for a good attribution in the Taylor attribution framework.
arXiv Detail & Related papers (2020-08-21T22:07:06Z)
Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis. We obtain new explanations that are loosely necessary and sufficient for a prediction. We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.