Related papers: Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck

Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck

URL: http://arxiv.org/abs/2310.19660v2
Date: Wed, 3 Apr 2024 14:29:03 GMT
Title: Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck
Authors: Josh Magnus Ludan, Qing Lyu, Yue Yang, Liam Dugan, Mark Yatskar, Chris Callison-Burch,
Abstract summary: Black-box deep neural networks excel in text classification, yet their application in high-stakes domains is hindered by their lack of interpretability. We propose Text Bottleneck Models (TBM), an intrinsically interpretable text classification framework that offers both global and local explanations.
Score: 46.015128326688234
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Black-box deep neural networks excel in text classification, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBM), an intrinsically interpretable text classification framework that offers both global and local explanations. Rather than directly predicting the output label, TBM predicts categorical values for a sparse set of salient concepts and uses a linear layer over those concept values to produce the final prediction. These concepts can be automatically discovered and measured by a Large Language Model (LLM) without the need for human curation. Experiments on 12 diverse text understanding datasets demonstrate that TBM can rival the performance of black-box baselines such as few-shot GPT-4 and finetuned DeBERTa while falling short against finetuned GPT-3.5. Comprehensive human evaluation validates that TBM can generate high-quality concepts relevant to the task, and the concept measurement aligns well with human judgments, suggesting that the predictions made by TBMs are interpretable. Overall, our findings suggest that TBM is a promising new framework that enhances interpretability with minimal performance tradeoffs.

Related papers

Towards more holistic interpretability: A lightweight disentangled Concept Bottleneck Model [5.700536552863068]
Concept Bottleneck Models (CBMs) enhance interpretability by predicting human-understandable concepts as intermediate representations.<n>We propose a lightweight Disentangled Concept Bottleneck Model (LDCBM) that automatically groups visual features into semantically meaningful components.<n> Experiments on three diverse datasets demonstrate that LDCBM achieves higher concept and class accuracy, outperforming previous CBMs in both interpretability and classification performance.
arXiv Detail & Related papers (2025-10-17T15:59:30Z)
Uncertainty-Aware Concept Bottleneck Models with Enhanced Interpretability [2.624902795082451]
Concept Bottleneck Models (CBMs) first embed images into a set of human-understandable concepts.<n>CBMs offer a semantically meaningful and interpretable classification pipeline.<n>CBMs often sacrifice predictive performance compared to end-to-end convolutional neural networks.
arXiv Detail & Related papers (2025-10-01T11:11:18Z)
Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Adaptive Test-Time Intervention for Concept Bottleneck Models [6.31833744906105]
Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD) FIGS-BD distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model.
arXiv Detail & Related papers (2025-03-09T19:03:48Z)
Towards Achieving Concept Completeness for Unsupervised Textual Concept Bottleneck Models [0.3694429692322631]
Textual Concept Bottleneck Models (TBMs) are interpretable-by-design models for text classification that predict a set of salient concepts before making the final prediction. This paper proposes Complete Textual Concept Bottleneck Model (CT-CBM),a novel TCBM generator building concept labels in a fully unsupervised manner using a small language model.
arXiv Detail & Related papers (2025-02-16T12:28:43Z)
VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance [16.16577751549164]
Concept Bottleneck Models (CBMs) provide interpretable prediction by introducing an intermediate Concept Bottleneck Layer (CBL), which encodes human-understandable concepts to explain models' decision. Recent works proposed to utilize Large Language Models (LLMs) and pre-trained Vision-Language Models (VLMs) to automate the training of CBMs, making it more scalable and automated. We propose Vision-Language-Guided Concept Bottleneck Model (VLG-CBM) to enable faithful interpretability with the benefits of boosted performance.
arXiv Detail & Related papers (2024-07-18T19:44:44Z)
Self-supervised Interpretable Concept-based Models for Text Classification [9.340843984411137]
This paper proposes a self-supervised Interpretable Concept Embedding Models (ICEMs) We leverage the generalization abilities of Large-Language Models to predict the concepts labels in a self-supervised way. ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones.
arXiv Detail & Related papers (2024-06-20T14:04:53Z)
InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
Interpretability for neural networks is a trade-off between three key requirements. We present InterpretCC, a family of interpretable-by-design neural networks that guarantee human-centric interpretability.
arXiv Detail & Related papers (2024-02-05T11:55:50Z)
Eliminating Information Leakage in Hard Concept Bottleneck Models with Supervised, Hierarchical Concept Learning [17.982131928413096]
Concept Bottleneck Models (CBMs) aim to deliver interpretable and interventionable predictions by bridging features and labels with human-understandable concepts. CBMs suffer from information leakage, where unintended information beyond the concepts are leaked to the subsequent label prediction. This paper proposes a new paradigm of CBMs, namely SupCBM, which achieves label predication via predicted concepts and a deliberately-designed intervention matrix.
arXiv Detail & Related papers (2024-02-03T03:50:58Z)
Can we Constrain Concept Bottleneck Models to Learn Semantically Meaningful Input Features? [0.6401548653313325]
Concept Bottleneck Models (CBMs) are regarded as inherently interpretable because they first predict a set of human-defined concepts. Current literature suggests that concept predictions often rely on irrelevant input features. In this paper, we demonstrate that CBMs can learn to map concepts to semantically meaningful input features.
arXiv Detail & Related papers (2024-02-01T10:18:43Z)
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint) It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis. TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z)
A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts. In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images. We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.