Interval Abstractions for Robust Counterfactual Explanations
- URL: http://arxiv.org/abs/2404.13736v2
- Date: Fri, 22 Nov 2024 15:01:12 GMT
- Title: Interval Abstractions for Robust Counterfactual Explanations
- Authors: Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni,
- Abstract summary: Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research.
Existing methods often become invalid when slight changes occur in the parameters of the model they were generated for.
We propose a novel interval abstraction technique for machine learning models, which allows us to obtain provable robustness guarantees.
- Score: 15.954944873701503
- License:
- Abstract: Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, CEs found by existing methods often become invalid when slight changes occur in the parameters of the model they were generated for. The literature lacks a way to provide exhaustive robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are mostly heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees for CEs under a possibly infinite set of plausible model changes $\Delta$. Based on this idea, we formalise a robustness notion for CEs, which we call $\Delta$-robustness, in both binary and multi-class classification settings. We present procedures to verify $\Delta$-robustness based on Mixed Integer Linear Programming, using which we further propose algorithms to generate CEs that are $\Delta$-robust. In an extensive empirical study involving neural networks and logistic regression models, we demonstrate the practical applicability of our approach. We discuss two strategies for determining the appropriate hyperparameters in our method, and we quantitatively benchmark CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs.
Related papers
- Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - DistiLLM: Towards Streamlined Distillation for Large Language Models [53.46759297929675]
DistiLLM is a more effective and efficient KD framework for auto-regressive language models.
DisiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs.
arXiv Detail & Related papers (2024-02-06T11:10:35Z) - Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation [19.065904250532995]
We propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE)
We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness.
We show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.
arXiv Detail & Related papers (2023-09-22T00:12:09Z) - Counterfactual Explanation via Search in Gaussian Mixture Distributed
Latent Space [19.312306559210125]
Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions.
guiding the user's interaction with AI systems by proposing easy-to-understand explanations is essential for the trustworthy adoption and long-term acceptance of AI systems.
We introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions.
arXiv Detail & Related papers (2023-07-25T10:21:26Z) - Finding Regions of Counterfactual Explanations via Robust Optimization [0.0]
A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes.
Most of the existing methods can only provide one CE, which may not be achievable for the user.
We derive an iterative method to calculate robust CEs that remain valid even after the features are slightly perturbed.
arXiv Detail & Related papers (2023-01-26T14:06:26Z) - Formalising the Robustness of Counterfactual Explanations for Neural
Networks [16.39168719476438]
We introduce an abstraction framework based on interval neural networks to verify the robustness of CFXs.
We show how embedding Delta-robustness within existing methods can provide CFXs which are provably robust.
arXiv Detail & Related papers (2022-08-31T14:11:23Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - A generalized framework for active learning reliability: survey and
benchmark [0.0]
We propose a modular framework to build on-the-fly efficient active learning strategies.
We devise 39 strategies for the solution of 20 reliability benchmark problems.
arXiv Detail & Related papers (2021-06-03T09:33:59Z) - Automated and Formal Synthesis of Neural Barrier Certificates for
Dynamical Models [70.70479436076238]
We introduce an automated, formal, counterexample-based approach to synthesise Barrier Certificates (BC)
The approach is underpinned by an inductive framework, which manipulates a candidate BC structured as a neural network, and a sound verifier, which either certifies the candidate's validity or generates counter-examples.
The outcomes show that we can synthesise sound BCs up to two orders of magnitude faster, with in particular a stark speedup on the verification engine.
arXiv Detail & Related papers (2020-07-07T07:39:42Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.