Related papers: The Impact of Machine Learning Uncertainty on the Robustness of Counterfactual Explanations

The Impact of Machine Learning Uncertainty on the Robustness of Counterfactual Explanations

URL: http://arxiv.org/abs/2602.00063v1
Date: Tue, 20 Jan 2026 09:45:25 GMT
Title: The Impact of Machine Learning Uncertainty on the Robustness of Counterfactual Explanations
Authors: Leonidas Christodoulou, Chang Sun,
Abstract summary: We show that counterfactual explanations are highly sensitive to model uncertainty.<n>Even small reductions in model accuracy can lead to large variations in the generated counterfactuals.<n>These findings underscore the need for uncertainty-aware explanation methods in domains such as finance and the social sciences.
Score: 3.9292613467235262
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Counterfactual explanations are widely used to interpret machine learning predictions by identifying minimal changes to input features that would alter a model's decision. However, most existing counterfactual methods have not been tested when model and data uncertainty change, resulting in explanations that may be unstable or invalid under real-world variability. In this work, we investigate the robustness of common combinations of machine learning models and counterfactual generation algorithms in the presence of both aleatoric and epistemic uncertainty. Through experiments on synthetic and real-world tabular datasets, we show that counterfactual explanations are highly sensitive to model uncertainty. In particular, we find that even small reductions in model accuracy - caused by increased noise or limited data - can lead to large variations in the generated counterfactuals on average and on individual instances. These findings underscore the need for uncertainty-aware explanation methods in domains such as finance and the social sciences.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond [3.6963146054309597]
Counterfactual explanations have emerged as a prominent method in Explainable Artificial Intelligence (XAI)<n>We present a novel framework that integrates perturbation theory and statistical mechanics to generate minimal counterfactual explanations.<n>Our approach systematically identifies the smallest modifications required to change a model's prediction while maintaining plausibility.
arXiv Detail & Related papers (2025-03-23T19:48:37Z)
Counterfactual Fairness through Transforming Data Orthogonal to Bias [7.109458605736819]
We propose a novel data pre-processing algorithm, Orthogonal to Bias (OB)<n>OB is designed to eliminate the influence of a group of continuous sensitive variables, thus promoting counterfactual fairness in machine learning applications.<n>OB is model-agnostic, making it applicable to a wide range of machine learning models and tasks.
arXiv Detail & Related papers (2024-03-26T16:40:08Z)
Investigating the Impact of Model Instability on Explanations and Uncertainty [43.254616360807496]
We simulate uncertainty in text input by introducing noise at inference time. We find that high uncertainty doesn't necessarily imply low explanation plausibility. This suggests that noise-augmented models may be better at identifying salient tokens when uncertain.
arXiv Detail & Related papers (2024-02-20T13:41:21Z)
Identifying Drivers of Predictive Aleatoric Uncertainty [2.5311562666866494]
We propose a straightforward approach to explain predictive aleatoric uncertainties.<n>We estimate uncertainty in regression as predictive variance by adapting a neural network with a Gaussian output distribution.<n>This approach can explain uncertainty influences more reliably than complex published approaches.
arXiv Detail & Related papers (2023-12-12T13:28:53Z)
Towards stable real-world equation discovery with assessing differentiating quality influence [52.2980614912553]
We propose alternatives to the commonly used finite differences-based method. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms.
arXiv Detail & Related papers (2023-11-09T23:32:06Z)
Deep Active Learning with Noise Stability [24.54974925491753]
Uncertainty estimation for unlabeled data is crucial to active learning. We propose a novel algorithm that leverages noise stability to estimate data uncertainty. Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis.
arXiv Detail & Related papers (2022-05-26T13:21:01Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented. It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.