Verified Training for Counterfactual Explanation Robustness under Data
Shift
- URL: http://arxiv.org/abs/2403.03773v1
- Date: Wed, 6 Mar 2024 15:06:16 GMT
- Title: Verified Training for Counterfactual Explanation Robustness under Data
Shift
- Authors: Anna P. Meyer and Yuhao Zhang and Aws Albarghouthi and Loris D'Antoni
- Abstract summary: Counterfactual explanations (CEs) enhance the interpretability of machine learning models by describing what changes to an input are necessary to change its prediction to a desired class.
Existing approaches generate CEs by focusing on a single, fixed model, and do not provide any formal guarantees on the CEs' future validity.
This paper introduces VeriTraCER, an approach that jointly trains a classifier and an explainer to explicitly consider the robustness of the generated CEs to small model shifts.
- Score: 18.156341188646348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual explanations (CEs) enhance the interpretability of machine
learning models by describing what changes to an input are necessary to change
its prediction to a desired class. These explanations are commonly used to
guide users' actions, e.g., by describing how a user whose loan application was
denied can be approved for a loan in the future. Existing approaches generate
CEs by focusing on a single, fixed model, and do not provide any formal
guarantees on the CEs' future validity. When models are updated periodically to
account for data shift, if the generated CEs are not robust to the shifts,
users' actions may no longer have the desired impacts on their predictions.
This paper introduces VeriTraCER, an approach that jointly trains a classifier
and an explainer to explicitly consider the robustness of the generated CEs to
small model shifts. VeriTraCER optimizes over a carefully designed loss
function that ensures the verifiable robustness of CEs to local model updates,
thus providing deterministic guarantees to CE validity. Our empirical
evaluation demonstrates that VeriTraCER generates CEs that (1) are verifiably
robust to small model updates and (2) display competitive robustness to
state-of-the-art approaches in handling empirical model updates including
random initialization, leave-one-out, and distribution shifts.
Related papers
- Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality [6.770853093478073]
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs.
Existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations.
We propose a method that minimizes the required feature changes while maintaining the validity of CE.
arXiv Detail & Related papers (2024-10-07T18:31:19Z) - Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change [4.239829789304117]
Counterfactual explanations (CFEs) guide users on how to adjust inputs to machine learning models to achieve desired outputs.
Current methods addressing this issue often support only specific models or change types.
This paper proposes a novel approach for generating CFEs that provides probabilistic guarantees for any model and change type.
arXiv Detail & Related papers (2024-08-09T03:35:53Z) - Interval Abstractions for Robust Counterfactual Explanations [15.954944873701503]
Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research.
Existing methods often become invalid when slight changes occur in the parameters of the model they were generated for.
We propose a novel interval abstraction technique for machine learning models, which allows us to obtain provable robustness guarantees.
arXiv Detail & Related papers (2024-04-21T18:24:34Z) - Do Counterfactual Examples Complicate Adversarial Training? [6.264110093518783]
We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers.
Our approach generates low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership.
We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs.
arXiv Detail & Related papers (2024-04-16T14:13:44Z) - Introducing User Feedback-based Counterfactual Explanations (UFCE) [49.1574468325115]
Counterfactual explanations (CEs) have emerged as a viable solution for generating comprehensible explanations in XAI.
UFCE allows for the inclusion of user constraints to determine the smallest modifications in the subset of actionable features.
UFCE outperforms two well-known CE methods in terms of textitproximity, textitsparsity, and textitfeasibility.
arXiv Detail & Related papers (2024-02-26T20:09:44Z) - Estimating calibration error under label shift without labels [47.57286245320775]
Existing CE estimators assume access to labels from the target domain, which are often unavailable in practice, i.e., when the model is deployed and used.
This work proposes a novel CE estimator under label shift, which is characterized by changes in the marginal label distribution $p(Y)$ while keeping the conditional $p(X|Y)$ constant between the source and target distributions.
Our contribution is an approach, which, by leveraging importance re-weighting of the labeled source distribution, provides consistent and unbiased CE estimation with respect to the shifted target distribution.
arXiv Detail & Related papers (2023-12-14T01:18:51Z) - Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation [19.065904250532995]
We propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE)
We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness.
We show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.
arXiv Detail & Related papers (2023-09-22T00:12:09Z) - Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with
Differentiable Expected Calibration Error [50.86671887712424]
The prevalence of domain adaptive semantic segmentation has prompted concerns regarding source domain data leakage.
To circumvent the requirement for source data, source-free domain adaptation has emerged as a viable solution.
We propose a novel calibration-guided source-free domain adaptive semantic segmentation framework.
arXiv Detail & Related papers (2023-08-06T03:28:34Z) - Confidence Attention and Generalization Enhanced Distillation for
Continuous Video Domain Adaptation [62.458968086881555]
Continuous Video Domain Adaptation (CVDA) is a scenario where a source model is required to adapt to a series of individually available changing target domains.
We propose a Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART) to address the challenge in CVDA.
arXiv Detail & Related papers (2023-03-18T16:40:10Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.