Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability
- URL: http://arxiv.org/abs/2404.18533v2
- Date: Tue, 30 Apr 2024 03:31:51 GMT
- Title: Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability
- Authors: Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang,
- Abstract summary: We introduce a formal definition of concept generalizable to diverse concept-based explanations.
We quantify faithfulness via the difference in the output upon perturbation.
We then provide an automatic measure for readability, by measuring the coherence of patterns that maximally activate a concept.
- Score: 35.48852504832633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic and non-deterministic, e.g. case study or human evaluation, hindering the development of the field. To bridge the gap, we approach concept-based explanation evaluation via faithfulness and readability. We first introduce a formal definition of concept generalizable to diverse concept-based explanations. Based on this, we quantify faithfulness via the difference in the output upon perturbation. We then provide an automatic measure for readability, by measuring the coherence of patterns that maximally activate a concept. This measure serves as a cost-effective and reliable substitute for human evaluation. Finally, based on measurement theory, we describe a meta-evaluation method for evaluating the above measures via reliability and validity, which can be generalized to other tasks as well. Extensive experimental analysis has been conducted to validate and inform the selection of concept evaluation measures.
Related papers
- Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis [24.946148305384202]
Concept Bottleneck Models (CBM) have emerged as an active interpretable framework incorporating human-interpretable concepts into decision-making.
We propose an evidential Concept Embedding Model (evi-CEM) which employs evidential learning to model the concept uncertainty.
Our evaluation demonstrates that evi-CEM achieves superior performance in terms of concept prediction.
arXiv Detail & Related papers (2024-06-27T12:29:50Z) - ConcEPT: Concept-Enhanced Pre-Training for Language Models [57.778895980999124]
ConcEPT aims to infuse conceptual knowledge into pre-trained language models.
It exploits external entity concept prediction to predict the concepts of entities mentioned in the pre-training contexts.
Results of experiments show that ConcEPT gains improved conceptual knowledge with concept-enhanced pre-training.
arXiv Detail & Related papers (2024-01-11T05:05:01Z) - Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts.
Although popular for their easy interpretation, concept explanations are known to be noisy.
We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Towards Concept-Aware Large Language Models [56.48016300758356]
Concepts play a pivotal role in various human cognitive functions, including learning, reasoning and communication.
There is very little work on endowing machines with the ability to form and reason with concepts.
In this work, we analyze how well contemporary large language models (LLMs) capture human concepts and their structure.
arXiv Detail & Related papers (2023-11-03T12:19:22Z) - A Unified Concept-Based System for Local, Global, and Misclassification
Explanations [13.321794212377949]
We present a unified concept-based system for unsupervised learning of both local and global concepts.
Our primary objective is to uncover the intrinsic concepts underlying each data category by training surrogate explainer networks.
Our approach facilitates the explanation of both accurate and erroneous predictions.
arXiv Detail & Related papers (2023-06-06T09:28:37Z) - Towards Robust Metrics for Concept Representation Evaluation [25.549961337814523]
Concept learning models have been shown to be prone to encoding impurities in their representations.
We propose novel metrics for evaluating the purity of concept representations in both approaches.
arXiv Detail & Related papers (2023-01-25T00:40:19Z) - COPEN: Probing Conceptual Knowledge in Pre-trained Language Models [60.10147136876669]
Conceptual knowledge is fundamental to human cognition and knowledge bases.
Existing knowledge probing works only focus on factual knowledge of pre-trained language models (PLMs) and ignore conceptual knowledge.
We design three tasks to probe whether PLMs organize entities by conceptual similarities, learn conceptual properties, and conceptualize entities in contexts.
For the tasks, we collect and annotate 24k data instances covering 393 concepts, which is COPEN, a COnceptual knowledge Probing bENchmark.
arXiv Detail & Related papers (2022-11-08T08:18:06Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.