Self-supervised Interpretable Concept-based Models for Text Classification
- URL: http://arxiv.org/abs/2406.14335v1
- Date: Thu, 20 Jun 2024 14:04:53 GMT
- Title: Self-supervised Interpretable Concept-based Models for Text Classification
- Authors: Francesco De Santis, Philippe Bich, Gabriele Ciravegna, Pietro Barbiero, Danilo Giordano, Tania Cerquitelli,
- Abstract summary: This paper proposes a self-supervised Interpretable Concept Embedding Models (ICEMs)
We leverage the generalization abilities of Large-Language Models to predict the concepts labels in a self-supervised way.
ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones.
- Score: 9.340843984411137
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Despite their success, Large-Language Models (LLMs) still face criticism as their lack of interpretability limits their controllability and reliability. Traditional post-hoc interpretation methods, based on attention and gradient-based analysis, offer limited insight into the model's decision-making processes. In the image field, Concept-based models have emerged as explainable-by-design architectures, employing human-interpretable features as intermediate representations. However, these methods have not been yet adapted to textual data, mainly because they require expensive concept annotations, which are impractical for real-world text data. This paper addresses this challenge by proposing a self-supervised Interpretable Concept Embedding Models (ICEMs). We leverage the generalization abilities of LLMs to predict the concepts labels in a self-supervised way, while we deliver the final predictions with an interpretable function. The results of our experiments show that ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones. Additionally, we show that our models are (i) interpretable, offering meaningful logical explanations for their predictions; (ii) interactable, allowing humans to modify intermediate predictions through concept interventions; and (iii) controllable, guiding the LLMs' decoding process to follow a required decision-making path.
Related papers
- Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions.
Existing approaches often require numerous human interventions per image to achieve strong performances.
We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Interpreting and Controlling Vision Foundation Models via Text
Explanations [45.30541722925515]
We present a framework for interpreting vision transformer's latent tokens with natural language.
Our approach enables understanding of model visual reasoning procedure without needing additional model training or data collection.
arXiv Detail & Related papers (2023-10-16T17:12:06Z) - Learning Transferable Conceptual Prototypes for Interpretable
Unsupervised Domain Adaptation [79.22678026708134]
In this paper, we propose an inherently interpretable method, named Transferable Prototype Learning ( TCPL)
To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process.
Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.
arXiv Detail & Related papers (2023-10-12T06:36:41Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - GlanceNets: Interpretabile, Leak-proof Concept-based Models [23.7625973884849]
Concept-based models (CBMs) combine high-performance and interpretability by acquiring and reasoning with a vocabulary of high-level concepts.
We provide a clear definition of interpretability in terms of alignment between the model's representation and an underlying data generation process.
We introduce GlanceNets, a new CBM that exploits techniques from disentangled representation learning and open-set recognition to achieve alignment.
arXiv Detail & Related papers (2022-05-31T08:53:53Z) - Robust Semantic Interpretability: Revisiting Concept Activation Vectors [0.0]
Interpretability methods for image classification attempt to expose whether the model is systematically biased or attending to the same cues as a human would.
Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole.
arXiv Detail & Related papers (2021-04-06T20:14:59Z) - Understanding Interpretability by generalized distillation in Supervised
Classification [3.5473853445215897]
Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex Machine Learning models.
We propose an interpretation-by-distillation formulation that is defined relative to other ML models.
We evaluate our proposed framework on the MNIST, Fashion-MNIST and Stanford40 datasets.
arXiv Detail & Related papers (2020-12-05T17:42:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.