Uncertainty-aware Language Guidance for Concept Bottleneck Models
- URL: http://arxiv.org/abs/2602.23495v1
- Date: Thu, 26 Feb 2026 20:59:03 GMT
- Title: Uncertainty-aware Language Guidance for Concept Bottleneck Models
- Authors: Yangyi Li, Mengdi Huai,
- Abstract summary: Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification.<n>However, the annotation of human-understandable concepts requires extensive expert knowledge and labor, constraining the broad adoption of CBMs.<n>We propose a novel uncertainty-aware CBM method, which not only rigorously quantifies the uncertainty of LLM-annotated concept labels with valid and distribution-free guarantees, but also incorporates quantified concept uncertainty into the CBM training procedure.
- Score: 19.882022420045804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification. However, the annotation of human-understandable concepts requires extensive expert knowledge and labor, constraining the broad adoption of CBMs. On the other hand, there are a few works that leverage the knowledge of large language models (LLMs) to construct concept bottlenecks. Nevertheless, they face two essential limitations: First, they overlook the uncertainty associated with the concepts annotated by LLMs and lack a valid mechanism to quantify uncertainty about the annotated concepts, increasing the risk of errors due to hallucinations from LLMs. Additionally, they fail to incorporate the uncertainty associated with these annotations into the learning process for concept bottleneck models. To address these limitations, we propose a novel uncertainty-aware CBM method, which not only rigorously quantifies the uncertainty of LLM-annotated concept labels with valid and distribution-free guarantees, but also incorporates quantified concept uncertainty into the CBM training procedure to account for varying levels of reliability across LLM-annotated concepts. We also provide the theoretical analysis for our proposed method. Extensive experiments on the real-world datasets validate the desired properties of our proposed methods.
Related papers
- Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs [51.378834857406325]
Mechanistic interpretability seeks to mitigate the issues through extracts from large language models.<n>Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts.<n>We show that SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear.
arXiv Detail & Related papers (2026-01-28T09:27:05Z) - There Was Never a Bottleneck in Concept Bottleneck Models [27.888718857850822]
Concept Bottleneck Models (CBMs) have emerged as a promising approach to mitigate this issue.<n>We argue that CBMs do not impose a true bottleneck: the fact that a component can predict a concept does not guarantee that it encodes only information about that concept.<n>We propose Minimal Concept Bottleneck Models (MCBMs), which incorporate an Information Bottleneck (IB) objective to constrain each representation component to retain only the information relevant to its corresponding concept.
arXiv Detail & Related papers (2025-06-05T10:50:42Z) - Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions [13.877511370053794]
Concept Bottleneck Models (CBM) address some of these challenges by learning interpretable concepts from high-dimensional data.<n>We describe a framework that provides theoretical guarantees on the correctness of the learned concepts and on the number of required labels.<n>We evaluate our framework in synthetic and image benchmarks, showing that the learned concepts have less impurities and are often more accurate than other CBMs.
arXiv Detail & Related papers (2025-02-10T15:01:56Z) - A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice [7.687545159131024]
We clarify the definitions of uncertainty and confidence, highlighting their distinctions and implications for model predictions.
We categorize various classes of uncertainty estimation methods derived from approaches.
We also explore techniques for uncertainty into diverse applications, including out-of-distribution detection, data annotation, and question clarification.
arXiv Detail & Related papers (2024-10-20T07:55:44Z) - CLUE: Concept-Level Uncertainty Estimation for Large Language Models [49.92690111618016]
We propose a novel framework for Concept-Level Uncertainty Estimation for Large Language Models (LLMs)
We leverage LLMs to convert output sequences into concept-level representations, breaking down sequences into individual concepts and measuring the uncertainty of each concept separately.
We conduct experiments to demonstrate that CLUE can provide more interpretable uncertainty estimation results compared with sentence-level uncertainty.
arXiv Detail & Related papers (2024-09-04T18:27:12Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - ConcEPT: Concept-Enhanced Pre-Training for Language Models [57.778895980999124]
ConcEPT aims to infuse conceptual knowledge into pre-trained language models.
It exploits external entity concept prediction to predict the concepts of entities mentioned in the pre-training contexts.
Results of experiments show that ConcEPT gains improved conceptual knowledge with concept-enhanced pre-training.
arXiv Detail & Related papers (2024-01-11T05:05:01Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.