Interpretability is in the Mind of the Beholder: A Causal Framework for
Human-interpretable Representation Learning
- URL: http://arxiv.org/abs/2309.07742v1
- Date: Thu, 14 Sep 2023 14:26:20 GMT
- Title: Interpretability is in the Mind of the Beholder: A Causal Framework for
Human-interpretable Representation Learning
- Authors: Emanuele Marconato and Andrea Passerini and Stefano Teso
- Abstract summary: Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data.
How to reliably acquire such concepts is, however, still fundamentally unclear.
We propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks.
- Score: 22.201878275784246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Focus in Explainable AI is shifting from explanations defined in terms of
low-level elements, such as input features, to explanations encoded in terms of
interpretable concepts learned from data. How to reliably acquire such concepts
is, however, still fundamentally unclear. An agreed-upon notion of concept
interpretability is missing, with the result that concepts used by both
post-hoc explainers and concept-based neural networks are acquired through a
variety of mutually incompatible strategies. Critically, most of these neglect
the human side of the problem: a representation is understandable only insofar
as it can be understood by the human at the receiving end. The key challenge in
Human-interpretable Representation Learning (HRL) is how to model and
operationalize this human element. In this work, we propose a mathematical
framework for acquiring interpretable representations suitable for both
post-hoc explainers and concept-based neural networks. Our formalization of HRL
builds on recent advances in causal representation learning and explicitly
models a human stakeholder as an external observer. This allows us to derive a
principled notion of alignment between the machine representation and the
vocabulary of concepts understood by the human. In doing so, we link alignment
and interpretability through a simple and intuitive name transfer game, and
clarify the relationship between alignment and a well-known property of
representations, namely disentanglment. We also show that alignment is linked
to the issue of undesirable correlations among concepts, also known as concept
leakage, and to content-style separation, all through a general
information-theoretic reformulation of these properties. Our conceptualization
aims to bridge the gap between the human and algorithmic sides of
interpretability and establish a stepping stone for new research on
human-interpretable representations.
Related papers
- Concept Induction using LLMs: a user experiment for assessment [1.1982127665424676]
This study explores the potential of a Large Language Model (LLM) to generate high-level concepts that are meaningful as explanations for humans.
We compare the concepts generated by the LLM with two other methods: concepts generated by humans and the ECII concept induction system.
Our findings indicate that while human-generated explanations remain superior, concepts derived from GPT-4 are more comprehensible to humans compared to those generated by ECII.
arXiv Detail & Related papers (2024-04-18T03:22:02Z) - Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks.
Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training.
This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Vector-based Representation is the Key: A Study on Disentanglement and
Compositional Generalization [77.57425909520167]
We show that it is possible to achieve both good concept recognition and novel concept composition.
We propose a method to reform the scalar-based disentanglement works to be vector-based to increase both capabilities.
arXiv Detail & Related papers (2023-05-29T13:05:15Z) - Interpretable Neural-Symbolic Concept Reasoning [7.1904050674791185]
Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts.
We propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings.
arXiv Detail & Related papers (2023-04-27T09:58:15Z) - Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts.
We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions.
We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z) - GlanceNets: Interpretabile, Leak-proof Concept-based Models [23.7625973884849]
Concept-based models (CBMs) combine high-performance and interpretability by acquiring and reasoning with a vocabulary of high-level concepts.
We provide a clear definition of interpretability in terms of alignment between the model's representation and an underlying data generation process.
We introduce GlanceNets, a new CBM that exploits techniques from disentangled representation learning and open-set recognition to achieve alignment.
arXiv Detail & Related papers (2022-05-31T08:53:53Z) - Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV)
We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats.
Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z) - Explainability Is in the Mind of the Beholder: Establishing the
Foundations of Explainable Artificial Intelligence [11.472707084860875]
We define explainability as (logical) reasoning applied to transparent insights (into black boxes) interpreted under certain background knowledge.
We revisit the trade-off between transparency and predictive power and its implications for ante-hoc and post-hoc explainers.
We discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability.
arXiv Detail & Related papers (2021-12-29T09:21:33Z) - Compositional Processing Emerges in Neural Networks Solving Math
Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations.
We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed.
Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.