DiConStruct: Causal Concept-based Explanations through Black-Box
Distillation
- URL: http://arxiv.org/abs/2401.08534v4
- Date: Fri, 26 Jan 2024 14:48:35 GMT
- Title: DiConStruct: Causal Concept-based Explanations through Black-Box
Distillation
- Authors: Ricardo Moreira, Jacopo Bono, M\'ario Cardoso, Pedro Saleiro, M\'ario
A. T. Figueiredo, Pedro Bizarro
- Abstract summary: We present DiConStruct, an explanation method that is both concept-based and causal.
Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations.
- Score: 9.735426765564474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model interpretability plays a central role in human-AI decision-making
systems. Ideally, explanations should be expressed using human-interpretable
semantic concepts. Moreover, the causal relations between these concepts should
be captured by the explainer to allow for reasoning about the explanations.
Lastly, explanation methods should be efficient and not compromise the
performance of the predictive task. Despite the rapid advances in AI
explainability in recent years, as far as we know to date, no method fulfills
these three properties. Indeed, mainstream methods for local concept
explainability do not produce causal explanations and incur a trade-off between
explainability and prediction performance. We present DiConStruct, an
explanation method that is both concept-based and causal, with the goal of
creating more interpretable local explanations in the form of structural causal
models and concept attributions. Our explainer works as a distillation model to
any black-box machine learning model by approximating its predictions while
producing the respective explanations. Because of this, DiConStruct generates
explanations efficiently while not impacting the black-box prediction task. We
validate our method on an image dataset and a tabular dataset, showing that
DiConStruct approximates the black-box models with higher fidelity than other
concept explainability baselines, while providing explanations that include the
causal relations between the concepts.
Related papers
- Counterfactual explainability of black-box prediction models [4.14360329494344]
We propose a new notion called counterfactual explainability for black-box prediction models.
Counterfactual explainability has three key advantages.
arXiv Detail & Related papers (2024-11-03T16:29:09Z) - An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity.
We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z) - Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts.
Although popular for their easy interpretation, concept explanations are known to be noisy.
We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z) - SurroCBM: Concept Bottleneck Surrogate Models for Generative Post-hoc
Explanation [11.820167569334444]
This paper introduces the Concept Bottleneck Surrogate Models (SurroCBM) to explain black-box models.
SurroCBM identifies shared and unique concepts across various black-box models and employs an explainable surrogate model for post-hoc explanations.
An effective training strategy using self-generated data is proposed to enhance explanation quality continuously.
arXiv Detail & Related papers (2023-10-11T17:46:59Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - Overlooked factors in concept-based explanations: Dataset choice,
concept learnability, and human capability [25.545486537295144]
Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts.
Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literature.
We analyze three commonly overlooked factors in concept-based explanations.
arXiv Detail & Related papers (2022-07-20T01:59:39Z) - ConceptDistil: Model-Agnostic Distillation of Concept Explanations [4.462334751640166]
Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop.
We propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation.
We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks.
arXiv Detail & Related papers (2022-05-07T08:58:54Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal
Sufficient Subsets [61.66584140190247]
We show that feature-based explanations pose problems even for explaining trivial models.
We show that two popular classes of explainers, Shapley explainers and minimal sufficient subsets explainers, target fundamentally different types of ground-truth explanations.
arXiv Detail & Related papers (2020-09-23T09:45:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.