Related papers: Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

URL: http://arxiv.org/abs/2309.07742v1
Date: Thu, 14 Sep 2023 14:26:20 GMT
Title: Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning
Authors: Emanuele Marconato and Andrea Passerini and Stefano Teso
Abstract summary: Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. We propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks.
Score: 22.201878275784246
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.

Related papers

From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning [52.32745233116143]
Humans organize knowledge into compact categories through semantic compression.<n>Large Language Models (LLMs) demonstrate remarkable linguistic abilities.<n>But whether their internal representations strike a human-like trade-off between compression and semantic fidelity is unclear.
arXiv Detail & Related papers (2025-05-21T16:29:00Z)
Human-like conceptual representations emerge from language prediction [72.5875173689788]
Large language models (LLMs) trained exclusively through next-token prediction over language data exhibit remarkably human-like behaviors. Are these models developing concepts akin to humans, and if so, how are such concepts represented and organized? Our results demonstrate that LLMs can flexibly derive concepts from linguistic descriptions in relation to contextual cues about other concepts. These findings establish that structured, human-like conceptual representations can naturally emerge from language prediction without real-world grounding.
arXiv Detail & Related papers (2025-01-21T23:54:17Z)
Concept Induction using LLMs: a user experiment for assessment [1.1982127665424676]
This study explores the potential of a Large Language Model (LLM) to generate high-level concepts that are meaningful as explanations for humans. We compare the concepts generated by the LLM with two other methods: concepts generated by humans and the ECII concept induction system. Our findings indicate that while human-generated explanations remain superior, concepts derived from GPT-4 are more comprehensible to humans compared to those generated by ECII.
arXiv Detail & Related papers (2024-04-18T03:22:02Z)
Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Explaining Explainability: Towards Deeper Actionable Insights into Deep Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level. We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z)
Vector-based Representation is the Key: A Study on Disentanglement and Compositional Generalization [77.57425909520167]
We show that it is possible to achieve both good concept recognition and novel concept composition. We propose a method to reform the scalar-based disentanglement works to be vector-based to increase both capabilities.
arXiv Detail & Related papers (2023-05-29T13:05:15Z)
Interpretable Neural-Symbolic Concept Reasoning [7.1904050674791185]
Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. We propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings.
arXiv Detail & Related papers (2023-04-27T09:58:15Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
GlanceNets: Interpretabile, Leak-proof Concept-based Models [23.7625973884849]
Concept-based models (CBMs) combine high-performance and interpretability by acquiring and reasoning with a vocabulary of high-level concepts. We provide a clear definition of interpretability in terms of alignment between the model's representation and an underlying data generation process. We introduce GlanceNets, a new CBM that exploits techniques from disentangled representation learning and open-set recognition to achieve alignment.
arXiv Detail & Related papers (2022-05-31T08:53:53Z)
Human-Centered Concept Explanations for Neural Networks [47.71169918421306]
We introduce concept explanations including the class of Concept Activation Vectors (CAV) We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.
arXiv Detail & Related papers (2022-02-25T01:27:31Z)
Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence [11.472707084860875]
We define explainability as (logical) reasoning applied to transparent insights (into black boxes) interpreted under certain background knowledge. We revisit the trade-off between transparency and predictive power and its implications for ante-hoc and post-hoc explainers. We discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability.
arXiv Detail & Related papers (2021-12-29T09:21:33Z)
Compositional Processing Emerges in Neural Networks Solving Math Problems [100.80518350845668]
Recent progress in artificial neural networks has shown that when large models are trained on enough linguistic data, grammatical structure emerges in their representations. We extend this work to the domain of mathematical reasoning, where it is possible to formulate precise hypotheses about how meanings should be composed. Our work shows that neural networks are not only able to infer something about the structured relationships implicit in their training data, but can also deploy this knowledge to guide the composition of individual meanings into composite wholes.
arXiv Detail & Related papers (2021-05-19T07:24:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.