Related papers: Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

URL: http://arxiv.org/abs/2404.18533v2
Date: Tue, 30 Apr 2024 03:31:51 GMT
Title: Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability
Authors: Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang,
Abstract summary: We introduce a formal definition of concept generalizable to diverse concept-based explanations. We quantify faithfulness via the difference in the output upon perturbation. We then provide an automatic measure for readability, by measuring the coherence of patterns that maximally activate a concept.
Score: 35.48852504832633
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic and non-deterministic, e.g. case study or human evaluation, hindering the development of the field. To bridge the gap, we approach concept-based explanation evaluation via faithfulness and readability. We first introduce a formal definition of concept generalizable to diverse concept-based explanations. Based on this, we quantify faithfulness via the difference in the output upon perturbation. We then provide an automatic measure for readability, by measuring the coherence of patterns that maximally activate a concept. This measure serves as a cost-effective and reliable substitute for human evaluation. Finally, based on measurement theory, we describe a meta-evaluation method for evaluating the above measures via reliability and validity, which can be generalized to other tasks as well. Extensive experimental analysis has been conducted to validate and inform the selection of concept evaluation measures.

Related papers

Enhancing the Comprehensibility of Text Explanations via Unsupervised Concept Discovery [21.58887931556088]
ECO-Concept is an intrinsically interpretable framework to discover comprehensible concepts with no concept annotations.<n>Our method achieves superior performance across diverse tasks.<n>Further concept evaluations validate that the concepts learned by ECO-Concept surpassed current counterparts in comprehensibility.
arXiv Detail & Related papers (2025-05-26T17:59:51Z)
Concept-Based Explainable Artificial Intelligence: Metrics and Benchmarks [0.0]
Concept-based explanation methods aim to improve the interpretability of machine learning models. We propose three metrics: the concept global importance metric, the concept existence metric, and the concept location metric. We demonstrate that, in many cases, even the most important concepts determined by post-hoc CBMs are not present in input images.
arXiv Detail & Related papers (2025-01-31T16:32:36Z)
Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions. We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks. We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z)
A survey on Concept-based Approaches For Model Improvement [2.1516043775965565]
Concepts are known to be the thinking ground of humans. We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in Deep Neural Networks (DNNs) We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
arXiv Detail & Related papers (2024-03-21T17:09:20Z)
An Axiomatic Approach to Model-Agnostic Concept Explanations [67.84000759813435]
We propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings.
arXiv Detail & Related papers (2024-01-12T20:53:35Z)
ConcEPT: Concept-Enhanced Pre-Training for Language Models [57.778895980999124]
ConcEPT aims to infuse conceptual knowledge into pre-trained language models. It exploits external entity concept prediction to predict the concepts of entities mentioned in the pre-training contexts. Results of experiments show that ConcEPT gains improved conceptual knowledge with concept-enhanced pre-training.
arXiv Detail & Related papers (2024-01-11T05:05:01Z)
Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
A Unified Concept-Based System for Local, Global, and Misclassification Explanations [13.321794212377949]
We present a unified concept-based system for unsupervised learning of both local and global concepts. Our primary objective is to uncover the intrinsic concepts underlying each data category by training surrogate explainer networks. Our approach facilitates the explanation of both accurate and erroneous predictions.
arXiv Detail & Related papers (2023-06-06T09:28:37Z)
Multi-dimensional concept discovery (MCD): A unifying framework with completeness guarantees [1.9465727478912072]
We propose Multi-dimensional Concept Discovery (MCD) as an extension of previous approaches that fulfills a completeness relation on the level of concepts. We empirically demonstrate the superiority of MCD against more constrained concept definitions.
arXiv Detail & Related papers (2023-01-27T18:53:19Z)
Towards Robust Metrics for Concept Representation Evaluation [25.549961337814523]
Concept learning models have been shown to be prone to encoding impurities in their representations. We propose novel metrics for evaluating the purity of concept representations in both approaches.
arXiv Detail & Related papers (2023-01-25T00:40:19Z)
Concept Gradient: Concept-based Interpretation Without Linear Assumption [77.96338722483226]
Concept Activation Vector (CAV) relies on learning a linear relation between some latent representation of a given model and concepts. We proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We demonstrated CG outperforms CAV in both toy examples and real world datasets.
arXiv Detail & Related papers (2022-08-31T17:06:46Z)
Translational Concept Embedding for Generalized Compositional Zero-shot Learning [73.60639796305415]
Generalized compositional zero-shot learning means to learn composed concepts of attribute-object pairs in a zero-shot fashion. This paper introduces a new approach, termed translational concept embedding, to solve these two difficulties in a unified framework.
arXiv Detail & Related papers (2021-12-20T21:27:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.