Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals
- URL: http://arxiv.org/abs/2310.00603v2
- Date: Wed, 22 Nov 2023 08:00:10 GMT
- Title: Faithful Explanations of Black-box NLP Models Using LLM-generated
Counterfactuals
- Authors: Yair Gat, Nitay Calderon, Amir Feder, Alexander Chapanin, Amit Sharma,
Roi Reichart
- Abstract summary: Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust.
Existing methods often fall short of explaining model predictions effectively or efficiently.
We propose two approaches for counterfactual (CF) approximation.
- Score: 67.64770842323966
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Causal explanations of the predictions of NLP systems are essential to ensure
safety and establish trust. Yet, existing methods often fall short of
explaining model predictions effectively or efficiently and are often
model-specific. In this paper, we address model-agnostic explanations,
proposing two approaches for counterfactual (CF) approximation. The first
approach is CF generation, where a large language model (LLM) is prompted to
change a specific text concept while keeping confounding concepts unchanged.
While this approach is demonstrated to be very effective, applying LLM at
inference-time is costly. We hence present a second approach based on matching,
and propose a method that is guided by an LLM at training-time and learns a
dedicated embedding space. This space is faithful to a given causal graph and
effectively serves to identify matches that approximate CFs. After showing
theoretically that approximating CFs is required in order to construct faithful
explanations, we benchmark our approaches and explain several models, including
LLMs with billions of parameters. Our empirical results demonstrate the
excellent performance of CF generation models as model-agnostic explainers.
Moreover, our matching approach, which requires far less test-time resources,
also provides effective explanations, surpassing many baselines. We also find
that Top-K techniques universally improve every tested method. Finally, we
showcase the potential of LLMs in constructing new benchmarks for model
explanation and subsequently validate our conclusions. Our work illuminates new
pathways for efficient and accurate approaches to interpreting NLP systems.
Related papers
- Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We show that prompting-based rationales align better with human-annotated rationales than attribution-based rationales.
We additionally find that the faithfulness limitations of prompting-based methods, which are identified in previous work, may be linked to their collapsed predictions.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach [6.913791588789051]
We introduce a novel prompt-learning-based Large Language Model(LLM) framework that significantly improves prediction accuracy and provides explicit explanations for individual predictions.
We tested the framework's efficacy using two widely used choice datasets: London Passenger Mode Choice (LPMC) and Optima-Mode collected in Switzerland.
The results indicate that the LLM significantly outperforms state-of-the-art deep learning methods and discrete choice models in predicting people's choices.
arXiv Detail & Related papers (2024-06-19T13:46:08Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Evaluating and Explaining Large Language Models for Code Using Syntactic
Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code.
At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes.
We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z) - Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z) - ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of
Arbitrary Predictive Models [6.939617874336667]
We introduce a model-agnostic algorithm to generate optimal counterfactual explanations.
Our method is easily applied to any black-box model, as this resembles the environment that the DRL agent interacts with.
In addition, we develop an algorithm to extract explainable decision rules from the DRL agent's policy, so as to make the process of generating CFs itself transparent.
arXiv Detail & Related papers (2021-10-22T17:08:49Z) - CounterNet: End-to-End Training of Prediction Aware Counterfactual
Explanations [12.313007847721215]
CounterNet is an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations.
Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model.
Our experiments on multiple real-world datasets show that CounterNet generates high-quality predictions.
arXiv Detail & Related papers (2021-09-15T20:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.