Related papers: When Can You Trust Your Explanations? A Robustness Analysis on Feature Importances

Related papers

A unified framework for evaluating the robustness of machine-learning interpretability for prospect risking [11.536380479187498]
We propose a unified framework to generate counterfactuals as well as quantify necessity and sufficiency.<n>This is done by performing a robustness evaluation of the explanations provided by LIME and SHAP on high dimensional structured prospect risking data.
arXiv Detail & Related papers (2026-02-16T03:32:10Z)
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z)
Beyond single-model XAI: aggregating multi-model explanations for enhanced trustworthiness [43.25173443756643]
This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models.<n>Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.
arXiv Detail & Related papers (2025-10-13T08:55:45Z)
Provenance Networks: End-to-End Exemplar-Based Explainability [0.0]
We introduce provenance networks, a novel class of neural models designed to provide end-to-end, training-data-driven explainability.<n>Provenance networks learn to link each prediction directly to its supporting training examples as part of the model's normal operation.<n>It addresses critical challenges in modern deep learning, including model opaqueness, hallucination, and the assignment of credit to data contributors.
arXiv Detail & Related papers (2025-10-03T01:48:38Z)
Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond [3.6963146054309597]
Counterfactual explanations have emerged as a prominent method in Explainable Artificial Intelligence (XAI) We present a novel framework that integrates perturbation theory and statistical mechanics to generate minimal counterfactual explanations. Our approach systematically identifies the smallest modifications required to change a model's prediction while maintaining plausibility.
arXiv Detail & Related papers (2025-03-23T19:48:37Z)
A Comprehensive Survey on Self-Interpretable Neural Networks [36.0575431131253]
Self-interpretable neural networks inherently reveal the prediction rationale through the model structures. We first collect and review existing works on self-interpretable neural networks and provide a structured summary of their methodologies. We also present concrete, visualized examples of model explanations and discuss their applicability across diverse scenarios.
arXiv Detail & Related papers (2025-01-26T18:50:16Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs. Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z)
LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations [1.0370398945228227]
We introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for machine learning models. The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features. Our approach offers causal explanations and outperforms LIME and SHAP in terms of local accuracy and consistency of explained features.
arXiv Detail & Related papers (2023-10-01T04:09:59Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid [11.122588110362706]
The rapid growth of distributed energy resources (DERs) poses significant cybersecurity and trust challenges to the grid controller. To enable a trustworthy smart grid controller, this work investigates a trustworthy artificial intelligence (AI) mechanism for proactive identification and explanation of the cyber risk caused by the control/status message of DERs.
arXiv Detail & Related papers (2023-06-12T02:28:17Z)
On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model, Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews. We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z)
Evaluating Explainability in Machine Learning Predictions through Explainer-Agnostic Metrics [0.0]
We develop six distinct model-agnostic metrics designed to quantify the extent to which model predictions can be explained. These metrics measure different aspects of model explainability, ranging from local importance, global importance, and surrogate predictions. We demonstrate the practical utility of these metrics on classification and regression tasks, and integrate these metrics into an existing Python package for public use.
arXiv Detail & Related papers (2023-02-23T15:28:36Z)
Causality-Aware Local Interpretable Model-Agnostic Explanations [7.412445894287709]
We propose a novel extension to a widely used local and model-agnostic explainer, which encodes explicit causal relationships within the data surrounding the instance being explained. Our approach overcomes the original method in terms of faithfully replicating the black-box model's mechanism and the consistency and reliability of the generated explanations.
arXiv Detail & Related papers (2022-12-10T10:12:27Z)
SAFARI: Versatile and Efficient Evaluations for Robustness of Interpretability [11.230696151134367]
Interpretability of Deep Learning (DL) is a barrier to trustworthy AI. It is vital to assess how robust DL interpretability is, given an XAI method.
arXiv Detail & Related papers (2022-08-19T16:07:22Z)
Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z)
Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations. We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z)
Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z)
Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty [66.17147341354577]
We argue for considering a complementary form of transparency by estimating and communicating the uncertainty associated with model predictions. We describe how uncertainty can be used to mitigate model unfairness, augment decision-making, and build trustworthy systems. This work constitutes an interdisciplinary review drawn from literature spanning machine learning, visualization/HCI, design, decision-making, and fairness.
arXiv Detail & Related papers (2020-11-15T17:26:14Z)
Recent Advances in Understanding Adversarial Robustness of Deep Neural Networks [15.217367754000913]
It is increasingly important to obtain models with high robustness that are resistant to adversarial examples. We give preliminary definitions on what adversarial attacks and robustness are. We study frequently-used benchmarks and mention theoretically-proved bounds for adversarial robustness.
arXiv Detail & Related papers (2020-11-03T07:42:53Z)
How Much Can I Trust You? -- Quantifying Uncertainties in Explaining Neural Networks [19.648814035399013]
Explainable AI (XAI) aims to provide interpretations for predictions made by learning machines, such as deep neural networks. We propose a new framework that allows to convert any arbitrary explanation method for neural networks into an explanation method for Bayesian neural networks. We demonstrate the effectiveness and usefulness of our approach extensively in various experiments.
arXiv Detail & Related papers (2020-06-16T08:54:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.