Towards LLM-guided Causal Explainability for Black-box Text Classifiers
- URL: http://arxiv.org/abs/2309.13340v2
- Date: Mon, 29 Jan 2024 05:59:12 GMT
- Title: Towards LLM-guided Causal Explainability for Black-box Text Classifiers
- Authors: Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu
- Abstract summary: We aim to leverage the instruction-following and textual understanding capabilities of recent Large Language Models to facilitate causal explainability.
We propose a three-step pipeline via which, we use an off-the-shelf LLM to identify the latent or unobserved features in the input text.
We experiment with our pipeline on multiple NLP text classification datasets, and present interesting and promising findings.
- Score: 16.36602400590088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advent of larger and more complex deep learning models, such as in
Natural Language Processing (NLP), model qualities like explainability and
interpretability, albeit highly desirable, are becoming harder challenges to
tackle and solve. For example, state-of-the-art models in text classification
are black-box by design. Although standard explanation methods provide some
degree of explainability, these are mostly correlation-based methods and do not
provide much insight into the model. The alternative of causal explainability
is more desirable to achieve but extremely challenging in NLP due to a variety
of reasons. Inspired by recent endeavors to utilize Large Language Models
(LLMs) as experts, in this work, we aim to leverage the instruction-following
and textual understanding capabilities of recent state-of-the-art LLMs to
facilitate causal explainability via counterfactual explanation generation for
black-box text classifiers. To do this, we propose a three-step pipeline via
which, we use an off-the-shelf LLM to: (1) identify the latent or unobserved
features in the input text, (2) identify the input features associated with the
latent features, and finally (3) use the identified input features to generate
a counterfactual explanation. We experiment with our pipeline on multiple NLP
text classification datasets, with several recent LLMs, and present interesting
and promising findings.
Related papers
- Evaluating the Reliability of Self-Explanations in Large Language Models [2.8894038270224867]
We evaluate two kinds of such self-explanations - extractive and counterfactual.
Our findings reveal, that, while these self-explanations can correlate with human judgement, they do not fully and accurately follow the model's decision process.
We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results.
arXiv Detail & Related papers (2024-07-19T17:41:08Z) - XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution [26.639271355209104]
Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks.
The contribution of the input prompt to the generated content still remains obscure to humans.
We introduce a counterfactual explanation framework based on joint prompt attribution, XPrompt.
arXiv Detail & Related papers (2024-05-30T18:16:41Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Logic-Scaffolding: Personalized Aspect-Instructed Recommendation
Explanation Generation using LLMs [20.446594942586604]
We propose a framework called Logic-Scaffolding, that combines the ideas of aspect-based explanation and chain-of-thought prompting to generate explanations through intermediate reasoning steps.
In this paper, we share our experience in building the framework and present an interactive demonstration for exploring our results.
arXiv Detail & Related papers (2023-12-22T00:30:10Z) - TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long
Documents [34.52684986240312]
We introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques.
We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations.
In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios.
arXiv Detail & Related papers (2023-12-03T04:35:04Z) - RecExplainer: Aligning Large Language Models for Explaining Recommendation Models [50.74181089742969]
Large language models (LLMs) have demonstrated remarkable intelligence in understanding, reasoning, and instruction following.
This paper presents the initial exploration of using LLMs as surrogate models to explain black-box recommender models.
To facilitate an effective alignment, we introduce three methods: behavior alignment, intention alignment, and hybrid alignment.
arXiv Detail & Related papers (2023-11-18T03:05:43Z) - Explanation-aware Soft Ensemble Empowers Large Language Model In-context
Learning [50.00090601424348]
Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks.
We propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.
arXiv Detail & Related papers (2023-11-13T06:13:38Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - Multi-resolution Interpretation and Diagnostics Tool for Natural
Language Classifiers [0.0]
This paper aims to create more flexible model explainability summaries by segments of observation or clusters of words that are semantically related to each other.
In addition, we introduce a root cause analysis method for NLP models, by analyzing representative False Positive and False Negative examples from different segments.
arXiv Detail & Related papers (2023-03-06T22:59:02Z) - Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics.
Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding.
We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.