Formalising the Robustness of Counterfactual Explanations for Neural
Networks
- URL: http://arxiv.org/abs/2208.14878v1
- Date: Wed, 31 Aug 2022 14:11:23 GMT
- Title: Formalising the Robustness of Counterfactual Explanations for Neural
Networks
- Authors: Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni
- Abstract summary: We introduce an abstraction framework based on interval neural networks to verify the robustness of CFXs.
We show how embedding Delta-robustness within existing methods can provide CFXs which are provably robust.
- Score: 16.39168719476438
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The use of counterfactual explanations (CFXs) is an increasingly popular
explanation strategy for machine learning models. However, recent studies have
shown that these explanations may not be robust to changes in the underlying
model (e.g., following retraining), which raises questions about their
reliability in real-world applications. Existing attempts towards solving this
problem are heuristic, and the robustness to model changes of the resulting
CFXs is evaluated with only a small number of retrained models, failing to
provide exhaustive guarantees. To remedy this, we propose the first notion to
formally and deterministically assess the robustness (to model changes) of CFXs
for neural networks, that we call {\Delta}-robustness. We introduce an
abstraction framework based on interval neural networks to verify the
{\Delta}-robustness of CFXs against a possibly infinite set of changes to the
model parameters, i.e., weights and biases. We then demonstrate the utility of
this approach in two distinct ways. First, we analyse the {\Delta}-robustness
of a number of CFX generation methods from the literature and show that they
unanimously host significant deficiencies in this regard. Second, we
demonstrate how embedding {\Delta}-robustness within existing methods can
provide CFXs which are provably robust.
Related papers
- Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI [59.96044730204345]
We introduce Derivative-Free Diffusion Manifold-Constrainted Gradients (FreeMCG)
FreeMCG serves as an improved basis for explainability of a given neural network.
We show that our method yields state-of-the-art results while preserving the essential properties expected of XAI tools.
arXiv Detail & Related papers (2024-11-22T11:15:14Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks [62.897993591443594]
FullCert is the first end-to-end certifier with sound, deterministic bounds.
We experimentally demonstrate FullCert's feasibility on two datasets.
arXiv Detail & Related papers (2024-06-17T13:23:52Z) - Interval Abstractions for Robust Counterfactual Explanations [15.954944873701503]
Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research.
Existing methods often become invalid when slight changes occur in the parameters of the model they were generated for.
We propose a novel interval abstraction technique for machine learning models, which allows us to obtain provable robustness guarantees.
arXiv Detail & Related papers (2024-04-21T18:24:34Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset.
Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model.
However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail.
We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - uSF: Learning Neural Semantic Field with Uncertainty [0.0]
We propose a new neural network model for the formation of extended vector representations, called uSF.
We show that with a small number of images available for training, a model quantifying uncertainty performs better than a model without such functionality.
arXiv Detail & Related papers (2023-12-13T09:34:01Z) - Contributions to Large Scale Bayesian Inference and Adversarial Machine
Learning [0.0]
The rampant adoption of ML methodologies has revealed that models are usually adopted to make decisions without taking into account the uncertainties in their predictions.
We believe that developing ML systems that take into predictive account uncertainties and are robust against adversarial examples is a must for real-world tasks.
arXiv Detail & Related papers (2021-09-25T23:02:47Z) - Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern
Classification [0.0]
We propose an LTCN-based model for interpretable pattern classification of structured data.
Our method brings its own mechanism for providing explanations by quantifying the relevance of each feature in the decision process.
Our interpretable model obtains competitive performance when compared to the state-of-the-art white and black boxes.
arXiv Detail & Related papers (2021-07-07T18:14:50Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.