Related papers: A Geometric Unification of Concept Learning with Concept Cones

A Geometric Unification of Concept Learning with Concept Cones

URL: http://arxiv.org/abs/2512.07355v1
Date: Mon, 08 Dec 2025 09:51:46 GMT
Title: A Geometric Unification of Concept Learning with Concept Cones
Authors: Alexandre Rocchi--Henry, Thomas Fel, Gianni Franchi,
Abstract summary: Two traditions of interpretability have evolved side by side but seldom spoken to each other: Concept Bottleneck Models (CBMs) and Sparse Autoencoders (SAEs)<n>We show that both paradigms instantiate the same geometric structure.<n>CBMs provide human-defined reference geometries, while SAEs can be evaluated by how well their learned cones approximate or contain those of CBMs.
Score: 58.70836885177496
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Two traditions of interpretability have evolved side by side but seldom spoken to each other: Concept Bottleneck Models (CBMs), which prescribe what a concept should be, and Sparse Autoencoders (SAEs), which discover what concepts emerge. While CBMs use supervision to align activations with human-labeled concepts, SAEs rely on sparse coding to uncover emergent ones. We show that both paradigms instantiate the same geometric structure: each learns a set of linear directions in activation space whose nonnegative combinations form a concept cone. Supervised and unsupervised methods thus differ not in kind but in how they select this cone. Building on this view, we propose an operational bridge between the two paradigms. CBMs provide human-defined reference geometries, while SAEs can be evaluated by how well their learned cones approximate or contain those of CBMs. This containment framework yields quantitative metrics linking inductive biases -- such as SAE type, sparsity, or expansion ratio -- to emergence of plausible\footnote{We adopt the terminology of \citet{jacovi2020towards}, who distinguish between faithful explanations (accurately reflecting model computations) and plausible explanations (aligning with human intuition and domain knowledge). CBM concepts are plausible by construction -- selected or annotated by humans -- though not necessarily faithful to the true latent factors that organise the data manifold.} concepts. Using these metrics, we uncover a ``sweet spot'' in both sparsity and expansion factor that maximizes both geometric and semantic alignment with CBM concepts. Overall, our work unifies supervised and unsupervised concept discovery through a shared geometric framework, providing principled metrics to measure SAE progress and assess how well discovered concept align with plausible human concepts.

Related papers

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions [56.796533084868884]
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge.<n>We put emphasis on the faithfulness of concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations.<n>Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced.
arXiv Detail & Related papers (2025-10-29T13:35:46Z)
Nonparametric Identification of Latent Concepts [17.996329262929113]
We argue that the cognitive mechanism of comparison, fundamental to human learning, is also vital for machines to recover true concepts underlying the data.<n>Specifically, we aim to develop a theoretical framework for the identifiability of concepts with multiple classes of observations.<n>We show that with sufficient diversity across classes, hidden concepts can be identified without assuming specific concept types.
arXiv Detail & Related papers (2025-09-30T18:13:53Z)
Graph Concept Bottleneck Models [26.57626285653119]
Concept Bottleneck Models (CBMs) provide explicit interpretations for deep neural networks through concepts.<n>We propose GraphCBMs: a new variant of CBM that facilitates concept relationships by constructing latent concept graphs.
arXiv Detail & Related papers (2025-08-19T20:23:18Z)
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions [13.877511370053794]
Concept Bottleneck Models (CBM) address some of these challenges by learning interpretable concepts from high-dimensional data.<n>We describe a framework that provides theoretical guarantees on the correctness of the learned concepts and on the number of required labels.<n>We evaluate our framework in synthetic and image benchmarks, showing that the learned concepts have less impurities and are often more accurate than other CBMs.
arXiv Detail & Related papers (2025-02-10T15:01:56Z)
Concept-Based Explainable Artificial Intelligence: Metrics and Benchmarks [0.0]
Concept-based explanation methods aim to improve the interpretability of machine learning models.<n>We propose three metrics: the concept global importance metric, the concept existence metric, and the concept location metric.<n>We demonstrate that, in many cases, even the most important concepts determined by post-hoc CBMs are not present in input images.
arXiv Detail & Related papers (2025-01-31T16:32:36Z)
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks. We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm. Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z)
Do Concept Bottleneck Models Respect Localities? [14.77558378567965]
Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models.<n>We assess whether concept predictors leverage "relevant" features to make predictions, a term we call locality.<n>We find that many concept-based models used in practice fail to respect localities because concept predictors cannot always clearly distinguish distinct concepts.
arXiv Detail & Related papers (2024-01-02T16:05:23Z)
Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees [1.9465727478912072]
We propose Multi-dimensional Concept Discovery (MCD) as an extension of previous approaches that fulfills a completeness relation on the level of concepts. We empirically demonstrate the superiority of MCD against more constrained concept definitions.
arXiv Detail & Related papers (2023-01-27T18:53:19Z)
Concept Activation Regions: A Generalized Framework For Concept-Based Explanations [95.94432031144716]
Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the deep neural network's latent space. In this work, we propose allowing concept examples to be scattered across different clusters in the DNN's latent space. This concept activation region (CAR) formalism yields global concept-based explanations and local concept-based feature importance.
arXiv Detail & Related papers (2022-09-22T17:59:03Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data. The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.