Related papers: Comparing zero-shot self-explanations with human rationales in multilingual text classification

Comparing zero-shot self-explanations with human rationales in multilingual text classification

URL: http://arxiv.org/abs/2410.03296v1
Date: Fri, 4 Oct 2024 10:14:12 GMT
Title: Comparing zero-shot self-explanations with human rationales in multilingual text classification
Authors: Stephanie Brandl, Oliver Eberle,
Abstract summary: Instruction-tuned LLMs generate self-explanations that do not require computations or the application of possibly complex XAI methods. We analyse whether this ability results in a good explanation by evaluating self-explanations in the form of input rationales. Our results show that self-explanations align more closely with human annotations compared to LRP, while maintaining a comparable level of faithfulness.
Score: 5.32539007352208
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction-tuned LLMs are able to provide an explanation about their output to users by generating self-explanations that do not require gradient computations or the application of possibly complex XAI methods. In this paper, we analyse whether this ability results in a good explanation by evaluating self-explanations in the form of input rationales with respect to their plausibility to humans as well as their faithfulness to models. For this, we apply two text classification tasks: sentiment classification and forced labour detection. Next to English, we further include Danish and Italian translations of the sentiment classification task and compare self-explanations to human annotations for all samples. To allow for direct comparisons, we also compute post-hoc feature attribution, i.e., layer-wise relevance propagation (LRP) and apply this pipeline to 4 LLMs (Llama2, Llama3, Mistral and Mixtral). Our results show that self-explanations align more closely with human annotations compared to LRP, while maintaining a comparable level of faithfulness.

Related papers

From latent factors to language: a user study on LLM-generated explanations for an inherently interpretable matrix-based recommender system [8.280161440212504]
We investigate whether large language models (LLMs) can generate effective, user-facing explanations from a mathematically interpretable recommendation model.<n>We conduct a study with 326 participants who assessed the quality of the explanations across five key dimensions.<n>Our analysis reveals that all explanation types are generally well received, with moderate statistical differences between strategies.
arXiv Detail & Related papers (2025-09-23T13:30:03Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
Evaluating Evidence Attribution in Generated Fact Checking Explanations [48.776087871960584]
We introduce a novel evaluation protocol, citation masking and recovery, to assess attribution quality in generated explanations. Experiments reveal that the best-performing LLMs still generate explanations with inaccurate attributions. Human-curated evidence is essential for generating better explanations.
arXiv Detail & Related papers (2024-06-18T14:13:13Z)
Can Language Models Explain Their Own Classification Behavior? [1.8177391253202122]
Large language models (LLMs) perform well at a myriad of tasks, but explaining the processes behind this performance is a challenge. This paper investigates whether LLMs can give faithful high-level explanations of their own internal processes. We release our dataset, ArticulateRules, which can be used to test self-explanation for LLMs trained either in-context or by finetuning.
arXiv Detail & Related papers (2024-05-13T02:31:08Z)
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs) Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy. At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z)
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z)
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations [14.685170467182369]
Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks. Since these models are instruction-tuned on human conversations to produce "helpful" responses, they can and often will produce explanations along with the response.
arXiv Detail & Related papers (2023-10-17T12:34:32Z)
Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs) We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric. Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z)
Benchmarking Large Language Models for News Summarization [79.37850439866938]
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. We find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability.
arXiv Detail & Related papers (2023-01-31T18:46:19Z)
Re-Examining Human Annotations for Interpretable NLP [80.81532239566992]
We conduct controlled experiments using crowd-sourced websites on two widely used datasets in Interpretable NLP. We compare the annotation results obtained from recruiting workers satisfying different levels of qualification. Our results reveal that the annotation quality is highly subject to the workers' qualification, and workers can be guided to provide certain annotations by the instructions.
arXiv Detail & Related papers (2022-04-10T02:27:30Z)
A Study of Automatic Metrics for the Evaluation of Natural Language Explanations [1.7205106391379024]
We explore parallels between the generation of such explanations and the much-studied field of evaluation of Natural Language Generation (NLG) We present the ExBAN corpus: a crowd-sourced corpus of NL explanations for Bayesian Networks. We find that embedding-based automatic NLG evaluation methods, such as BERTScore and BLEURT, have a higher correlation with human ratings, compared to word-overlap metrics, such as BLEU and ROUGE.
arXiv Detail & Related papers (2021-03-15T17:10:39Z)
Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations. LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output. We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.