Related papers: There Was Never a Bottleneck in Concept Bottleneck Models

There Was Never a Bottleneck in Concept Bottleneck Models

URL: http://arxiv.org/abs/2506.04877v2
Date: Thu, 25 Sep 2025 12:09:14 GMT
Title: There Was Never a Bottleneck in Concept Bottleneck Models
Authors: Antonio Almudévar, José Miguel Hernández-Lobato, Alfonso Ortega,
Abstract summary: Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue.<n>We argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept.<n>We propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept.
Score: 27.888718857850822
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning representations are often difficult to interpret, which can hinder their deployment in sensitive applications. Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue by learning representations that support target task performance while ensuring that each component predicts a concrete concept from a predefined set. In this work, we argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept. This shortcoming raises concerns regarding interpretability and the validity of intervention procedures. To overcome this limitation, we propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept. This IB is implemented via a variational regularization term added to the training loss. As a result, MCBMs yield more interpretable representations, support principled concept-level interventions, and remain consistent with probability-theoretic foundations.

Related papers

Uncertainty-aware Language Guidance for Concept Bottleneck Models [19.882022420045804]
Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification.<n>However, the annotation of human-understandable concepts requires extensive expert knowledge and labor, constraining the broad adoption of CBMs.<n>We propose a novel uncertainty-aware CBM method, which not only rigorously quantifies the uncertainty of LLM-annotated concept labels with valid and distribution-free guarantees, but also incorporates quantified concept uncertainty into the CBM training procedure.
arXiv Detail & Related papers (2026-02-26T20:59:03Z)
MeGU: Machine-Guided Unlearning with Target Feature Disentanglement [73.49657372882082]
We propose a novel framework that guides unlearning through concept-aware re-alignment.<n>MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
arXiv Detail & Related papers (2026-02-19T05:20:31Z)
Concepts' Information Bottleneck Models [9.435622803973898]
Concept Bottleneck Models (CBMs) aim to deliver interpretable predictions by routing decisions through a human-understandable concept layer.<n>We introduce an explicit Information Bottleneck regularizer on the concept layer that penalizes $I(X;C)$ while preserving task-relevant information in $I(C;Y)$, encouraging minimal-sufficient concept representations.
arXiv Detail & Related papers (2026-02-16T10:33:20Z)
Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs [51.378834857406325]
Mechanistic interpretability seeks to mitigate the issues through extracts from large language models.<n>Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts.<n>We show that SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear.
arXiv Detail & Related papers (2026-01-28T09:27:05Z)
Controllable Concept Bottleneck Models [55.03639763625018]
Controllable Concept Bottleneck Models (CCBMs)<n>CCBMs support three granularities of model editing: concept-label-level, concept-level, and data-level.<n>CCBMs enjoy mathematically rigorous closed-form approximations derived from influence functions that obviate the need for retraining.
arXiv Detail & Related papers (2026-01-01T19:30:06Z)
Towards more holistic interpretability: A lightweight disentangled Concept Bottleneck Model [5.700536552863068]
Concept Bottleneck Models (CBMs) enhance interpretability by predicting human-understandable concepts as intermediate representations.<n>We propose a lightweight Disentangled Concept Bottleneck Model (LDCBM) that automatically groups visual features into semantically meaningful components.<n> Experiments on three diverse datasets demonstrate that LDCBM achieves higher concept and class accuracy, outperforming previous CBMs in both interpretability and classification performance.
arXiv Detail & Related papers (2025-10-17T15:59:30Z)
Interpretable Reward Modeling with Active Concept Bottlenecks [54.00085739303773]
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning.<n>Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts.<n>We formalize an active learning strategy that dynamically acquires the most informative concept labels.
arXiv Detail & Related papers (2025-07-07T06:26:04Z)
Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z)
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models [15.97013792698305]
Concept Bottleneck Models (CBMs) decompose image classification into a process governed by interpretable, human-readable concepts.<n>We introduce a dynamic, agent-based approach that adjusts the concept bank in response to environmental feedback.<n>We also propose Conditional Concept Bottleneck Models (CoCoBMs) to overcome the limitations in traditional CBMs' concept scoring mechanisms.
arXiv Detail & Related papers (2025-06-02T05:25:52Z)
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? [73.80382983108997]
Representation intervention aims to locate and modify the representations that encode the underlying concepts in Large Language Models.<n>If the interventions are faithful, the intervened LLMs should erase the harmful concepts and be robust to both in-distribution adversarial prompts and the out-of-distribution jailbreaks.<n>We propose Concept Concentration (COCA), which simplifies the decision boundary between harmful and benign representations.
arXiv Detail & Related papers (2025-05-24T12:23:52Z)
Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion [56.35484513848296]
This research introduces continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models.<n>We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts.
arXiv Detail & Related papers (2025-03-17T23:17:16Z)
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation [73.16975077770765]
Modular customization is essential for applications like concept stylization and multi-concept customization.<n>Instant merging methods often cause identity loss and interference of individual merged concepts.<n>We propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity.
arXiv Detail & Related papers (2025-03-11T16:10:36Z)
Towards Achieving Concept Completeness for Textual Concept Bottleneck Models [0.3694429692322631]
This paper proposes a novel TCBM generator building concept labels in a fully unsupervised manner using a small language model.<n>CT-CBM iteratively targets and adds important and identifiable concepts in the bottleneck layer to create a complete concept basis.<n>CT-CBM achieves striking results against competitors in terms of concept basis and concept detection accuracy.
arXiv Detail & Related papers (2025-02-16T12:28:43Z)
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions [7.3784937557132855]
Concept Bottleneck Models (CBM) address some of these challenges by learning interpretable concepts from high-dimensional data.<n>We describe a framework that provides theoretical guarantees on the correctness of the learned concepts and on the number of required labels.<n>We evaluate our framework in synthetic and image benchmarks, showing that the learned concepts have less impurities and are often more accurate than other CBMs.
arXiv Detail & Related papers (2025-02-10T15:01:56Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
On the Concept Trustworthiness in Concept Bottleneck Models [39.928868605678744]
Concept Bottleneck Models (CBMs) break down the reasoning process into the input-to-concept mapping and the concept-to-label prediction. Despite the transparency of the concept-to-label prediction, the mapping from the input to the intermediate concept remains a black box. A pioneering metric, referred to as concept trustworthiness score, is proposed to gauge whether the concepts are derived from relevant regions. An enhanced CBM is introduced, enabling concept predictions to be made specifically from distinct parts of the feature map.
arXiv Detail & Related papers (2024-03-21T12:24:53Z)
Coarse-to-Fine Concept Bottleneck Models [9.910980079138206]
This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs) Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene.
arXiv Detail & Related papers (2023-10-03T14:57:31Z)
Post-hoc Concept Bottleneck Models [11.358495577593441]
Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts and use the concepts to make predictions. CBMs are restrictive in practice as they require concept labels in the training data to learn the bottleneck and do not leverage strong pretrained models. We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining interpretability benefits.
arXiv Detail & Related papers (2022-05-31T00:29:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.