CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model
Behavior
- URL: http://arxiv.org/abs/2205.14140v1
- Date: Fri, 27 May 2022 17:59:14 GMT
- Title: CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model
Behavior
- Authors: Eldar David Abraham, Karel D'Oosterlinck, Amir Feder, Yair Ori Gat,
Atticus Geiger, Christopher Potts, Roi Reichart, Zhengxuan Wu
- Abstract summary: We cast model explanation as the causal inference problem of estimating causal effects of real-world concepts on the output behavior of ML models.
We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP)
We use CEBaB to compare the quality of a range of concept-based explanation methods covering different assumptions and conceptions of the problem.
- Score: 26.248879735549277
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing size and complexity of modern ML systems has improved their
predictive capabilities but made their behavior harder to explain. Many
techniques for model explanation have been developed in response, but we lack
clear criteria for assessing these techniques. In this paper, we cast model
explanation as the causal inference problem of estimating causal effects of
real-world concepts on the output behavior of ML models given actual input
data. We introduce CEBaB, a new benchmark dataset for assessing concept-based
explanation methods in Natural Language Processing (NLP). CEBaB consists of
short restaurant reviews with human-generated counterfactual reviews in which
an aspect (food, noise, ambiance, service) of the dining experience was
modified. Original and counterfactual reviews are annotated with
multiply-validated sentiment ratings at the aspect-level and review-level. The
rich structure of CEBaB allows us to go beyond input features to study the
effects of abstract, real-world concepts on model behavior. We use CEBaB to
compare the quality of a range of concept-based explanation methods covering
different assumptions and conceptions of the problem, and we seek to establish
natural metrics for comparative assessments of these methods.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - DEAL: Disentangle and Localize Concept-level Explanations for VLMs [10.397502254316645]
Large pre-trained Vision-Language Models might not be able to identify fine-grained concepts.
We propose to DisEnt and Localize (Angle) concept-level explanations for concepts without human annotations.
Our empirical results demonstrate that the proposed method significantly improves the concept-level explanations of the model in terms of disentanglability and localizability.
arXiv Detail & Related papers (2024-07-19T15:39:19Z) - Introducing User Feedback-based Counterfactual Explanations (UFCE) [49.1574468325115]
Counterfactual explanations (CEs) have emerged as a viable solution for generating comprehensible explanations in XAI.
UFCE allows for the inclusion of user constraints to determine the smallest modifications in the subset of actionable features.
UFCE outperforms two well-known CE methods in terms of textitproximity, textitsparsity, and textitfeasibility.
arXiv Detail & Related papers (2024-02-26T20:09:44Z) - Estimation of Concept Explanations Should be Uncertainty Aware [39.598213804572396]
We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts.
Although popular for their easy interpretation, concept explanations are known to be noisy.
We propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations.
arXiv Detail & Related papers (2023-12-13T11:17:27Z) - Benchmarking and Enhancing Disentanglement in Concept-Residual Models [4.177318966048984]
Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features.
CBMs' performance depends on the engineered features and can severely suffer from incomplete sets of concepts.
This work proposes three novel approaches to mitigate information leakage by disentangling concepts and residuals.
arXiv Detail & Related papers (2023-11-30T21:07:26Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours.
We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations.
By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Latent Opinions Transfer Network for Target-Oriented Opinion Words
Extraction [63.70885228396077]
We propose a novel model to transfer opinions knowledge from resource-rich review sentiment classification datasets to low-resource task TOWE.
Our model achieves better performance compared to other state-of-the-art methods and significantly outperforms the base model without transferring opinions knowledge.
arXiv Detail & Related papers (2020-01-07T11:50:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.