Related papers: Provably Robust Bayesian Counterfactual Explanations under Model Changes

Provably Robust Bayesian Counterfactual Explanations under Model Changes

URL: http://arxiv.org/abs/2601.16659v1
Date: Fri, 23 Jan 2026 11:24:57 GMT
Title: Provably Robust Bayesian Counterfactual Explanations under Model Changes
Authors: Jamie Duell, Xiuyi Fan,
Abstract summary: We introduce Probabilistically Safe CEs (PSCE), a method for generating counterfactual explanations that are $$-safe.<n>Based on Bayesian principles, PSCE provides formal probabilistic guarantees for CEs under model changes.<n>We compare our approach against state-of-the-art Bayesian CE methods, where PSCE produces counterfactual explanations that are more plausible and discriminative.
Score: 1.4330077657731444
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual explanations (CEs) offer interpretable insights into machine learning predictions by answering ``what if?" questions. However, in real-world settings where models are frequently updated, existing counterfactual explanations can quickly become invalid or unreliable. In this paper, we introduce Probabilistically Safe CEs (PSCE), a method for generating counterfactual explanations that are $δ$-safe, to ensure high predictive confidence, and $ε$-robust to ensure low predictive variance. Based on Bayesian principles, PSCE provides formal probabilistic guarantees for CEs under model changes which are adhered to in what we refer to as the $\langle δ, ε\rangle$-set. Uncertainty-aware constraints are integrated into our optimization framework and we validate our method empirically across diverse datasets. We compare our approach against state-of-the-art Bayesian CE methods, where PSCE produces counterfactual explanations that are not only more plausible and discriminative, but also provably robust under model change.

Related papers

Reliable Explanations or Random Noise? A Reliability Metric for XAI [6.948460965107209]
We introduce the Explanation Reliability Index (ERI), a family of metrics that quantifies explanation stability under four reliability axioms.<n>ERI enables principled assessment of explanation reliability and supports more trustworthy AI (XAI) systems.
arXiv Detail & Related papers (2026-02-04T22:04:07Z)
CONFEX: Uncertainty-Aware Counterfactual Explanations with Conformal Guarantees [4.014351341279428]
We propose CONFEX, a novel method for generating uncertainty-aware counterfactual explanations.<n> CONFEX explanations are designed to provide local coverage guarantees, addressing the issue that CFX generation violates exchangeability.<n>We evaluate CONFEX against state-of-the-art methods across diverse benchmarks and metrics.
arXiv Detail & Related papers (2025-10-22T16:43:36Z)
Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change [4.239829789304117]
Counterfactual explanations (CFEs) guide users on how to adjust inputs to machine learning models to achieve desired outputs.<n>Current methods addressing this issue often support only specific models or change types.<n>This paper proposes a novel approach for generating CFEs that provides probabilistic guarantees for any model and change type.
arXiv Detail & Related papers (2024-08-09T03:35:53Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Verified Training for Counterfactual Explanation Robustness under Data Shift [18.156341188646348]
Counterfactual explanations (CEs) enhance the interpretability of machine learning models by describing what changes to an input are necessary to change its prediction to a desired class. Existing approaches generate CEs by focusing on a single, fixed model, and do not provide any formal guarantees on the CEs' future validity. This paper introduces VeriTraCER, an approach that jointly trains a classifier and an explainer to explicitly consider the robustness of the generated CEs to small model shifts.
arXiv Detail & Related papers (2024-03-06T15:06:16Z)
Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification. Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples. This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z)
Flexible and Robust Counterfactual Explanations with Minimal Satisfiable Perturbations [56.941276017696076]
We propose a conceptually simple yet effective solution named Counterfactual Explanations with Minimal Satisfiable Perturbations (CEMSP) CEMSP constrains changing values of abnormal features with the help of their semantically meaningful normal ranges. Compared to existing methods, we conduct comprehensive experiments on both synthetic and real-world datasets to demonstrate that our method provides more robust explanations while preserving flexibility.
arXiv Detail & Related papers (2023-09-09T04:05:56Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
CC-Cert: A Probabilistic Approach to Certify General Robustness of Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks. It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations. We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Reliable Post hoc Explanations: Modeling Uncertainty in Explainability [44.9824285459365]
Black box explanations are increasingly being employed to establish model credibility in high-stakes settings. prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability. We develop a novel Bayesian framework for generating local explanations along with their associated uncertainty.
arXiv Detail & Related papers (2020-08-11T22:52:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.