Formalising the Robustness of Counterfactual Explanations for Neural
Networks
- URL: http://arxiv.org/abs/2208.14878v1
- Date: Wed, 31 Aug 2022 14:11:23 GMT
- Title: Formalising the Robustness of Counterfactual Explanations for Neural
Networks
- Authors: Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni
- Abstract summary: We introduce an abstraction framework based on interval neural networks to verify the robustness of CFXs.
We show how embedding Delta-robustness within existing methods can provide CFXs which are provably robust.
- Score: 16.39168719476438
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The use of counterfactual explanations (CFXs) is an increasingly popular
explanation strategy for machine learning models. However, recent studies have
shown that these explanations may not be robust to changes in the underlying
model (e.g., following retraining), which raises questions about their
reliability in real-world applications. Existing attempts towards solving this
problem are heuristic, and the robustness to model changes of the resulting
CFXs is evaluated with only a small number of retrained models, failing to
provide exhaustive guarantees. To remedy this, we propose the first notion to
formally and deterministically assess the robustness (to model changes) of CFXs
for neural networks, that we call {\Delta}-robustness. We introduce an
abstraction framework based on interval neural networks to verify the
{\Delta}-robustness of CFXs against a possibly infinite set of changes to the
model parameters, i.e., weights and biases. We then demonstrate the utility of
this approach in two distinct ways. First, we analyse the {\Delta}-robustness
of a number of CFX generation methods from the literature and show that they
unanimously host significant deficiencies in this regard. Second, we
demonstrate how embedding {\Delta}-robustness within existing methods can
provide CFXs which are provably robust.
Related papers
- SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks [62.897993591443594]
FullCert is the first end-to-end certifier with sound, deterministic bounds, which proves robustness against both training-time and inference-time attacks.
We combine our theoretical work with a new open-source library BoundFlow, which enables model training on bounded datasets.
arXiv Detail & Related papers (2024-06-17T13:23:52Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset.
Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model.
However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail.
We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - uSF: Learning Neural Semantic Field with Uncertainty [0.0]
We propose a new neural network model for the formation of extended vector representations, called uSF.
We show that with a small number of images available for training, a model quantifying uncertainty performs better than a model without such functionality.
arXiv Detail & Related papers (2023-12-13T09:34:01Z) - Interpretations Steered Network Pruning via Amortized Inferred Saliency
Maps [85.49020931411825]
Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources.
We propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process.
We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models.
arXiv Detail & Related papers (2022-09-07T01:12:11Z) - Consistent Counterfactuals for Deep Models [25.1271020453651]
Counterfactual examples are used to explain predictions of machine learning models in key areas such as finance and medical diagnosis.
This paper studies the consistency of model prediction on counterfactual examples in deep networks under small changes to initial training conditions.
arXiv Detail & Related papers (2021-10-06T23:48:55Z) - Contributions to Large Scale Bayesian Inference and Adversarial Machine
Learning [0.0]
The rampant adoption of ML methodologies has revealed that models are usually adopted to make decisions without taking into account the uncertainties in their predictions.
We believe that developing ML systems that take into predictive account uncertainties and are robust against adversarial examples is a must for real-world tasks.
arXiv Detail & Related papers (2021-09-25T23:02:47Z) - Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern
Classification [0.0]
We propose an LTCN-based model for interpretable pattern classification of structured data.
Our method brings its own mechanism for providing explanations by quantifying the relevance of each feature in the decision process.
Our interpretable model obtains competitive performance when compared to the state-of-the-art white and black boxes.
arXiv Detail & Related papers (2021-07-07T18:14:50Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - On the Reproducibility of Neural Network Predictions [52.47827424679645]
We study the problem of churn, identify factors that cause it, and propose two simple means of mitigating it.
We first demonstrate that churn is indeed an issue, even for standard image classification tasks.
We propose using emphminimum entropy regularizers to increase prediction confidences.
We present empirical results showing the effectiveness of both techniques in reducing churn while improving the accuracy of the underlying model.
arXiv Detail & Related papers (2021-02-05T18:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.