Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals
- URL: http://arxiv.org/abs/2310.00603v2
- Date: Wed, 22 Nov 2023 08:00:10 GMT
- Title: Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals
- Authors: Yair Gat, Nitay Calderon, Amir Feder, Alexander Chapanin, Amit Sharma,
Roi Reichart
- Abstract summary: Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
- Score: 67.64770842323966
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Causal explanations of the predictions of NLP systems are essential to ensure
safety and establish trust. Yet, existing methods often fall short of
explaining model predictions effectively or efficiently and are often
model-specific. In this paper, we address model-agnostic explanations,
proposing two approaches for counterfactual (CF) approximation. The first
approach is CF generation, where a large language model (LLM) is prompted to
change a specific text concept while keeping confounding concepts unchanged.
While this approach is demonstrated to be very effective, applying LLM at
inference-time is costly. We hence present a second approach based on matching,
and propose a method that is guided by an LLM at training-time and learns a
dedicated embedding space. This space is faithful to a given causal graph and
effectively serves to identify matches that approximate CFs. After showing
theoretically that approximating CFs is required in order to construct faithful
explanations, we benchmark our approaches and explain several models, including
LLMs with billions of parameters. Our empirical results demonstrate the
excellent performance of CF generation models as model-agnostic explainers.
Moreover, our matching approach, which requires far less test-time resources,
also provides effective explanations, surpassing many baselines. We also find
that Top-K techniques universally improve every tested method. Finally, we
showcase the potential of LLMs in constructing new benchmarks for model
explanation and subsequently validate our conclusions. Our work illuminates new
pathways for efficient and accurate approaches to interpreting NLP systems.
Related papers
- Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs [12.48241058167222]
Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions.
But studies reveal that they often struggle with tasks requiring reasoning, such as math or physics limitation.
This raises questions about whether LLMs truly comprehend embedded knowledge or merely learn to replicate the token distribution without a true understanding of the content.
We propose Decon Causal Adaptation (DCA), a novel parameter-efficient fine-tuning (PEFT) method to enhance the model's reasoning capabilities.
arXiv Detail & Related papers (2024-09-04T13:17:09Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of
Arbitrary Predictive Models [6.939617874336667]
We introduce a model-agnostic algorithm to generate optimal counterfactual explanations.
Our method is easily applied to any black-box model, as this resembles the environment that the DRL agent interacts with.
In addition, we develop an algorithm to extract explainable decision rules from the DRL agent's policy, so as to make the process of generating CFs itself transparent.
arXiv Detail & Related papers (2021-10-22T17:08:49Z) - CounterNet: End-to-End Training of Prediction Aware Counterfactual
Explanations [12.313007847721215]
CounterNet is an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations.
Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model.
Our experiments on multiple real-world datasets show that CounterNet generates high-quality predictions.
arXiv Detail & Related papers (2021-09-15T20:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.