Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation
- URL: http://arxiv.org/abs/2309.12545v2
- Date: Thu, 4 Apr 2024 15:29:25 GMT
- Title: Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation
- Authors: Junqi Jiang, Jianglin Lan, Francesco Leofante, Antonio Rago, Francesca Toni,
- Abstract summary: We propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE)
We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness.
We show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.
- Score: 19.065904250532995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual Explanations (CEs) have received increasing interest as a major methodology for explaining neural network classifiers. Usually, CEs for an input-output pair are defined as data points with minimum distance to the input that are classified with a different label than the output. To tackle the established problem that CEs are easily invalidated when model parameters are updated (e.g. retrained), studies have proposed ways to certify the robustness of CEs under model parameter changes bounded by a norm ball. However, existing methods targeting this form of robustness are not sound or complete, and they may generate implausible CEs, i.e., outliers wrt the training dataset. In fact, no existing method simultaneously optimises for closeness and plausibility while preserving robustness guarantees. In this work, we propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE), a method leveraging on robust optimisation techniques to address the aforementioned limitations in the literature. We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness. Through a comparative experiment involving six baselines, five of which target robustness, we show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - Interval Abstractions for Robust Counterfactual Explanations [15.954944873701503]
Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research.
Existing methods often become invalid when slight changes occur in the parameters of the model they were generated for.
We propose a novel interval abstraction technique for machine learning models, which allows us to obtain provable robustness guarantees.
arXiv Detail & Related papers (2024-04-21T18:24:34Z) - SURE: SUrvey REcipes for building reliable and robust deep networks [12.268921703825258]
In this paper, we revisit techniques for uncertainty estimation within deep neural networks and consolidate a suite of techniques to enhance their reliability.
We rigorously evaluate SURE against the benchmark of failure prediction, a critical testbed for uncertainty estimation efficacy.
When applied to real-world challenges, such as data corruption, label noise, and long-tailed class distribution, SURE exhibits remarkable robustness, delivering results that are superior or on par with current state-of-the-art specialized methods.
arXiv Detail & Related papers (2024-03-01T13:58:19Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Counterfactual Explanation via Search in Gaussian Mixture Distributed
Latent Space [19.312306559210125]
Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions.
guiding the user's interaction with AI systems by proposing easy-to-understand explanations is essential for the trustworthy adoption and long-term acceptance of AI systems.
We introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions.
arXiv Detail & Related papers (2023-07-25T10:21:26Z) - Finding Regions of Counterfactual Explanations via Robust Optimization [0.0]
A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes.
Most of the existing methods can only provide one CE, which may not be achievable for the user.
We derive an iterative method to calculate robust CEs that remain valid even after the features are slightly perturbed.
arXiv Detail & Related papers (2023-01-26T14:06:26Z) - Towards a Theoretical Understanding of the Robustness of Variational
Autoencoders [82.68133908421792]
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations.
We develop a novel criterion for robustness in probabilistic models: $r$-robustness.
We show that VAEs trained using disentangling methods score well under our robustness metrics.
arXiv Detail & Related papers (2020-07-14T21:22:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.