Related papers: Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

URL: http://arxiv.org/abs/2309.12545v2
Date: Thu, 4 Apr 2024 15:29:25 GMT
Title: Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation
Authors: Junqi Jiang, Jianglin Lan, Francesco Leofante, Antonio Rago, Francesca Toni,
Abstract summary: We propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE) We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness. We show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.
Score: 19.065904250532995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual Explanations (CEs) have received increasing interest as a major methodology for explaining neural network classifiers. Usually, CEs for an input-output pair are defined as data points with minimum distance to the input that are classified with a different label than the output. To tackle the established problem that CEs are easily invalidated when model parameters are updated (e.g. retrained), studies have proposed ways to certify the robustness of CEs under model parameter changes bounded by a norm ball. However, existing methods targeting this form of robustness are not sound or complete, and they may generate implausible CEs, i.e., outliers wrt the training dataset. In fact, no existing method simultaneously optimises for closeness and plausibility while preserving robustness guarantees. In this work, we propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE), a method leveraging on robust optimisation techniques to address the aforementioned limitations in the literature. We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness. Through a comparative experiment involving six baselines, five of which target robustness, we show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.

Related papers

Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems [89.35169042718739]
collaborative inference enables end users to leverage powerful deep learning models without exposure of sensitive raw data to cloud servers. Recent studies have revealed that these intermediate features may not sufficiently preserve privacy, as information can be leaked and raw data can be reconstructed via model inversion attacks (MIAs) This work first theoretically proves that the conditional entropy of inputs given intermediate features provides a guaranteed lower bound on the reconstruction mean square error (MSE) under any MIA. Then, we derive a differentiable and solvable measure for bounding this conditional entropy based on the Gaussian mixture estimation and propose a conditional entropy algorithm to enhance the inversion robustness
arXiv Detail & Related papers (2025-03-01T07:15:21Z)
Distilling Calibration via Conformalized Credal Inference [36.01369881486141]
One way to enhance reliability is through uncertainty quantification via Bayesian inference. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance.
arXiv Detail & Related papers (2025-01-10T15:57:23Z)
BiCert: A Bilinear Mixed Integer Programming Formulation for Precise Certified Bounds Against Data Poisoning Attacks [62.897993591443594]
Data poisoning attacks pose one of the biggest threats to modern AI systems. Data poisoning attacks pose one of the biggest threats to modern AI systems. Data poisoning attacks pose one of the biggest threats to modern AI systems.
arXiv Detail & Related papers (2024-12-13T14:56:39Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
Interval Abstractions for Robust Counterfactual Explanations [15.954944873701503]
Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research. Existing methods often become invalid when slight changes occur in the parameters of the model they were generated for. We propose a novel interval abstraction technique for machine learning models, which allows us to obtain provable robustness guarantees.
arXiv Detail & Related papers (2024-04-21T18:24:34Z)
SURE: SUrvey REcipes for building reliable and robust deep networks [12.268921703825258]
In this paper, we revisit techniques for uncertainty estimation within deep neural networks and consolidate a suite of techniques to enhance their reliability. We rigorously evaluate SURE against the benchmark of failure prediction, a critical testbed for uncertainty estimation efficacy. When applied to real-world challenges, such as data corruption, label noise, and long-tailed class distribution, SURE exhibits remarkable robustness, delivering results that are superior or on par with current state-of-the-art specialized methods.
arXiv Detail & Related papers (2024-03-01T13:58:19Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance. Current methods with a fixed model do not work uniformly well across various datasets. This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z)
Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space [19.312306559210125]
Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions. guiding the user's interaction with AI systems by proposing easy-to-understand explanations is essential for the trustworthy adoption and long-term acceptance of AI systems. We introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions.
arXiv Detail & Related papers (2023-07-25T10:21:26Z)
Finding Regions of Counterfactual Explanations via Robust Optimization [0.0]
A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. We derive an iterative method to calculate robust CEs that remain valid even after the features are slightly perturbed.
arXiv Detail & Related papers (2023-01-26T14:06:26Z)
Towards a Theoretical Understanding of the Robustness of Variational Autoencoders [82.68133908421792]
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations. We develop a novel criterion for robustness in probabilistic models: $r$-robustness. We show that VAEs trained using disentangling methods score well under our robustness metrics.
arXiv Detail & Related papers (2020-07-14T21:22:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.