Related papers: Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

URL: http://arxiv.org/abs/2310.11207v1
Date: Tue, 17 Oct 2023 12:34:32 GMT
Title: Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
Authors: Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou, Leilani H. Gilpin
Abstract summary: Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks. Since these models are instruction-tuned on human conversations to produce "helpful" responses, they can and often will produce explanations along with the response.
Score: 14.685170467182369
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment analysis, mathematical reasoning and summarization. Furthermore, since these models are instruction-tuned on human conversations to produce "helpful" responses, they can and often will produce explanations along with the response, which we call self-explanations. For example, when analyzing the sentiment of a movie review, the model may output not only the positivity of the sentiment, but also an explanation (e.g., by listing the sentiment-laden words such as "fantastic" and "memorable" in the review). How good are these automatically generated self-explanations? In this paper, we investigate this question on the task of sentiment analysis and for feature attribution explanation, one of the most commonly studied settings in the interpretability literature (for pre-ChatGPT models). Specifically, we study different ways to elicit the self-explanations, evaluate their faithfulness on a set of evaluation metrics, and compare them to traditional explanation methods such as occlusion or LIME saliency maps. Through an extensive set of experiments, we find that ChatGPT's self-explanations perform on par with traditional ones, but are quite different from them according to various agreement metrics, meanwhile being much cheaper to produce (as they are generated along with the prediction). In addition, we identified several interesting characteristics of them, which prompt us to rethink many current model interpretability practices in the era of ChatGPT(-like) LLMs.

Related papers

Comparing zero-shot self-explanations with human rationales in multilingual text classification [5.32539007352208]
Instruction-tuned LLMs generate self-explanations that do not require computations or the application of possibly complex XAI methods. We analyse whether this ability results in a good explanation by evaluating self-explanations in the form of input rationales. Our results show that self-explanations align more closely with human annotations compared to LRP, while maintaining a comparable level of faithfulness.
arXiv Detail & Related papers (2024-10-04T10:14:12Z)
Scenarios and Approaches for Situated Natural Language Explanations [18.022428746019582]
We collect a benchmarking dataset, Situation-Based Explanation. This dataset contains 100 explanandums. For each "explanandum paired with an audience" situation, we include a human-written explanation. We examine three categories of prompting methods: rule-based prompting, meta-prompting, and in-context learning prompting.
arXiv Detail & Related papers (2024-06-07T15:56:32Z)
Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance. Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate. This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z)
Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study [3.0059120458540383]
We consider the evaluation of the lexical richness of the text generated by conversational Large Language Models (LLMs) and how it depends on the model parameters. The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model.
arXiv Detail & Related papers (2024-02-11T13:41:17Z)
"You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure. We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure. Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z)
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models [107.07851578154242]
Language models (LMs) have strong multi-step (i.e., procedural) reasoning capabilities. It is unclear whether LMs perform tasks by cheating with answers memorized from pretraining corpus, or, via a multi-step reasoning mechanism. We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples.
arXiv Detail & Related papers (2023-10-23T01:47:29Z)
STREET: A Multi-Task Structured Reasoning and Explanation Benchmark [56.555662318619135]
We introduce a unified multi-task and multi-domain natural language reasoning and explanation benchmark. We expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer.
arXiv Detail & Related papers (2023-02-13T22:34:02Z)
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs [26.118193748582197]
We evaluate four categories of widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference, models in three of these categories perform close to random. These results suggest that certain fine-tuning strategies are far better at inducing pragmatic understanding in models.
arXiv Detail & Related papers (2022-10-26T19:04:23Z)
Interpreting Language Models with Contrastive Explanations [99.7035899290924]
Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. We show that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena.
arXiv Detail & Related papers (2022-02-21T18:32:24Z)
Prompting Contrastive Explanations for Commonsense Reasoning Tasks [74.7346558082693]
Large pretrained language models (PLMs) can achieve near-human performance on commonsense reasoning tasks. We show how to use these same models to generate human-interpretable evidence.
arXiv Detail & Related papers (2021-06-12T17:06:13Z)
Evaluating Explanations: How much do explanations from the teacher aid students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning. Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.