Semi-supervised counterfactual explanations
- URL: http://arxiv.org/abs/2303.12634v1
- Date: Wed, 22 Mar 2023 15:17:16 GMT
- Title: Semi-supervised counterfactual explanations
- Authors: Shravan Kumar Sajja, Sumanta Mukherjee, Satyam Dwivedi
- Abstract summary: We address the challenge of generating counterfactual explanations that lie in the same data distribution as that of the training data.
This requirement has been addressed through the incorporation of auto-encoder reconstruction loss in the counterfactual search process.
We show further improvement in the interpretability of counterfactual explanations when the auto-encoder is trained in a semi-supervised fashion with class tagged input data.
- Score: 3.6810543937967912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Counterfactual explanations for machine learning models are used to find
minimal interventions to the feature values such that the model changes the
prediction to a different output or a target output. A valid counterfactual
explanation should have likely feature values. Here, we address the challenge
of generating counterfactual explanations that lie in the same data
distribution as that of the training data and more importantly, they belong to
the target class distribution. This requirement has been addressed through the
incorporation of auto-encoder reconstruction loss in the counterfactual search
process. Connecting the output behavior of the classifier to the latent space
of the auto-encoder has further improved the speed of the counterfactual search
process and the interpretability of the resulting counterfactual explanations.
Continuing this line of research, we show further improvement in the
interpretability of counterfactual explanations when the auto-encoder is
trained in a semi-supervised fashion with class tagged input data. We
empirically evaluate our approach on several datasets and show considerable
improvement in-terms of several metrics.
Related papers
- Are Data-driven Explanations Robust against Out-of-distribution Data? [18.760475318852375]
We propose an end-to-end model-agnostic learning framework Distributionally Robust Explanations (DRE)
Key idea is to fully utilize the inter-distribution information to provide supervisory signals for the learning of explanations without human annotation.
Our results demonstrate that the proposed method significantly improves the model's performance in terms of explanation and prediction robustness against distributional shifts.
arXiv Detail & Related papers (2023-03-29T02:02:08Z) - VCNet: A self-explaining model for realistic counterfactual generation [52.77024349608834]
Counterfactual explanation is a class of methods to make local explanations of machine learning decisions.
We present VCNet-Variational Counter Net, a model architecture that combines a predictor and a counterfactual generator.
We show that VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem.
arXiv Detail & Related papers (2022-12-21T08:45:32Z) - Supervised Feature Compression based on Counterfactual Analysis [3.2458225810390284]
This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model.
Using the discretized dataset, an optimal Decision Tree can be trained that resembles the black-box model, but that is interpretable and compact.
arXiv Detail & Related papers (2022-11-17T21:16:14Z) - Discrete Key-Value Bottleneck [95.61236311369821]
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant.
One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning.
Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks.
We propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.
arXiv Detail & Related papers (2022-07-22T17:52:30Z) - Generating Sparse Counterfactual Explanations For Multivariate Time
Series [0.5161531917413706]
We propose a generative adversarial network (GAN) architecture that generates SPARse Counterfactual Explanations for multivariate time series.
Our approach provides a custom sparsity layer and regularizes the counterfactual loss function in terms of similarity, sparsity, and smoothness of trajectories.
We evaluate our approach on real-world human motion datasets as well as a synthetic time series interpretability benchmark.
arXiv Detail & Related papers (2022-06-02T08:47:06Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.