Related papers: Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

URL: http://arxiv.org/abs/2310.00603v2
Date: Wed, 22 Nov 2023 08:00:10 GMT
Title: Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals
Authors: Yair Gat, Nitay Calderon, Amir Feder, Alexander Chapanin, Amit Sharma, Roi Reichart
Abstract summary: Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust. Existing methods often fall short of explaining model predictions effectively or efficiently. We propose two approaches for counterfactual (CF) approximation.
Score: 67.64770842323966
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Causal explanations of the predictions of NLP systems are essential to ensure safety and establish trust. Yet, existing methods often fall short of explaining model predictions effectively or efficiently and are often model-specific. In this paper, we address model-agnostic explanations, proposing two approaches for counterfactual (CF) approximation. The first approach is CF generation, where a large language model (LLM) is prompted to change a specific text concept while keeping confounding concepts unchanged. While this approach is demonstrated to be very effective, applying LLM at inference-time is costly. We hence present a second approach based on matching, and propose a method that is guided by an LLM at training-time and learns a dedicated embedding space. This space is faithful to a given causal graph and effectively serves to identify matches that approximate CFs. After showing theoretically that approximating CFs is required in order to construct faithful explanations, we benchmark our approaches and explain several models, including LLMs with billions of parameters. Our empirical results demonstrate the excellent performance of CF generation models as model-agnostic explainers. Moreover, our matching approach, which requires far less test-time resources, also provides effective explanations, surpassing many baselines. We also find that Top-K techniques universally improve every tested method. Finally, we showcase the potential of LLMs in constructing new benchmarks for model explanation and subsequently validate our conclusions. Our work illuminates new pathways for efficient and accurate approaches to interpreting NLP systems.

Related papers

Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z)
Large Language Models as Computable Approximations to Solomonoff Induction [11.811838796672369]
We establish the first formal connection between large language models (LLMs) and Algorithmic Information Theory (AIT)<n>We leverage AIT to provide a unified theoretical explanation for in-context learning, few-shot learning, and scaling laws.<n>Our framework bridges the gap between theoretical foundations and practical LLM behaviors, providing both explanatory power and actionable insights for future model development.
arXiv Detail & Related papers (2025-05-21T17:35:08Z)
Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency [66.96286531087549]
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompass a variety of approaches.<n>We propose a novel approach to integrating model confidence with output consistency, resulting in a family of efficient and robust UQ methods.<n>We evaluate our approach across various tasks such as question answering, abstractive summarization, and machine translation.
arXiv Detail & Related papers (2025-02-07T14:30:12Z)
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs [12.48241058167222]
Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions. But studies reveal that they often struggle with tasks requiring reasoning, such as math or physics limitation. This raises questions about whether LLMs truly comprehend embedded knowledge or merely learn to replicate the token distribution without a true understanding of the content. We propose Decon Causal Adaptation (DCA), a novel parameter-efficient fine-tuning (PEFT) method to enhance the model's reasoning capabilities.
arXiv Detail & Related papers (2024-09-04T13:17:09Z)
Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models. We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses. We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)
Making Linear MDPs Practical via Contrastive Representation Learning [101.75885788118131]
It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning. We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
arXiv Detail & Related papers (2022-07-14T18:18:02Z)
ReLACE: Reinforcement Learning Agent for Counterfactual Explanations of Arbitrary Predictive Models [6.939617874336667]
We introduce a model-agnostic algorithm to generate optimal counterfactual explanations. Our method is easily applied to any black-box model, as this resembles the environment that the DRL agent interacts with. In addition, we develop an algorithm to extract explainable decision rules from the DRL agent's policy, so as to make the process of generating CFs itself transparent.
arXiv Detail & Related papers (2021-10-22T17:08:49Z)
CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations [12.313007847721215]
CounterNet is an end-to-end learning framework which integrates predictive model training and the generation of counterfactual (CF) explanations. Unlike post-hoc methods, CounterNet enables the optimization of the CF explanation generation only once together with the predictive model. Our experiments on multiple real-world datasets show that CounterNet generates high-quality predictions.
arXiv Detail & Related papers (2021-09-15T20:09:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.