Related papers: Properties and Challenges of LLM-Generated Explanations

Properties and Challenges of LLM-Generated Explanations

URL: http://arxiv.org/abs/2402.10532v1
Date: Fri, 16 Feb 2024 09:37:54 GMT
Title: Properties and Challenges of LLM-Generated Explanations
Authors: Jenny Kunz, Marco Kuhlmann
Abstract summary: We study the self-rationalising capabilities of large language models (LLMs) We find that generated explanations show selectivity and contain illustrative elements, but less frequently are subjective or misleading. In particular, we outline positive and negative implications depending on the goals and user groups of the self-rationalising system.
Score: 3.257973235065581
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The self-rationalising capabilities of large language models (LLMs) have been explored in restricted settings, using task/specific data sets. However, current LLMs do not (only) rely on specifically annotated data; nonetheless, they frequently explain their outputs. The properties of the generated explanations are influenced by the pre-training corpus and by the target data used for instruction fine-tuning. As the pre-training corpus includes a large amount of human-written explanations "in the wild", we hypothesise that LLMs adopt common properties of human explanations. By analysing the outputs for a multi-domain instruction fine-tuning data set, we find that generated explanations show selectivity and contain illustrative elements, but less frequently are subjective or misleading. We discuss reasons and consequences of the properties' presence or absence. In particular, we outline positive and negative implications depending on the goals and user groups of the self-rationalising system.

Related papers

Hierarchical Interaction Summarization and Contrastive Prompting for Explainable Recommendations [9.082885521130617]
We propose a novel approach combining profile generation via hierarchical interaction summarization (PGHIS) with contrastive prompting for explanation generation (CPEG)<n>Our approach outperforms existing state-of-the-art methods, achieving a great improvement on metrics about explainability (e.g., 5% on GPTScore) and text quality.
arXiv Detail & Related papers (2025-07-08T14:45:47Z)
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders [29.356200147371275]
Large language models (LLMs) excel at handling human queries, but they can occasionally generate flawed or unexpected responses. We propose using a fixed vocabulary set for feature interpretations and designing a mutual information-based objective. We propose two runtime steering strategies that adjust the learned feature activations based on their corresponding explanations.
arXiv Detail & Related papers (2025-02-21T16:36:42Z)
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision [50.45597801390757]
Instruct-LF is a goal-oriented latent factor discovery system. It integrates instruction-following ability with statistical models to handle noisy datasets.
arXiv Detail & Related papers (2025-02-21T02:03:08Z)
Differentially Private Steering for Large Language Model Alignment [55.30573701583768]
We present the first study of aligning Large Language Models with private datasets. Our work proposes the Private Steering for LLM Alignment (PSA) algorithm to edit activations with differential privacy guarantees. Our results show that PSA achieves DP guarantees for LLM alignment with minimal loss in performance.
arXiv Detail & Related papers (2025-01-30T17:58:36Z)
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy [3.0429215246859465]
Large language models (LLMs) have demonstrated remarkable proficiency in generating detailed and coherent explanations. To assess the level of comprehension of a model relative to the content it generates, we implemented a self-evaluation pipeline. We refer to this self-evaluation approach as Explain-Query-Test (EQT)
arXiv Detail & Related papers (2025-01-20T20:07:18Z)
Can adversarial attacks by large language models be attributed? [1.3812010983144802]
Attributing outputs from Large Language Models in adversarial settings presents significant challenges that are likely to grow in importance. We investigate this attribution problem using formal language theory, specifically language identification in the limit as introduced by Gold and extended by Angluin. Our results show that due to the non-identifiability of certain language classes it is theoretically impossible to attribute outputs to specific LLMs with certainty.
arXiv Detail & Related papers (2024-11-12T18:28:57Z)
XForecast: Evaluating Natural Language Explanations for Time Series Forecasting [72.57427992446698]
Time series forecasting aids decision-making, especially for stakeholders who rely on accurate predictions. Traditional explainable AI (XAI) methods, which underline feature or temporal importance, often require expert knowledge. evaluating forecast NLEs is difficult due to the complex causal relationships in time series data.
arXiv Detail & Related papers (2024-10-18T05:16:39Z)
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs) In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt. Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z)
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models [59.970391602080205]
This study investigates whether such constraints on generation space impact LLMs abilities, including reasoning and domain knowledge comprehension. We evaluate LLMs performance when restricted to adhere to structured formats versus generating free-form responses across various common tasks. We find that stricter format constraints generally lead to greater performance degradation in reasoning tasks.
arXiv Detail & Related papers (2024-08-05T13:08:24Z)
Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales. We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z)
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions. A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations. Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Investigating the Effect of Natural Language Explanations on Out-of-Distribution Generalization in Few-shot NLI [11.44224857047629]
We formulate a few-shot learning setup and examine the effects of natural language explanations on OOD generalization. We leverage the templates in the HANS dataset and construct templated natural language explanations for each template. We show that generated explanations show competitive BLEU scores against groundtruth explanations, but they fail to improve prediction performance.
arXiv Detail & Related papers (2021-10-12T18:00:02Z)
LIREx: Augmenting Language Inference with Relevant Explanation [1.4780878458667916]
Natural language explanations (NLEs) are a form of data annotation in which annotators identify rationales when assigning labels to data instances. NLEs have been shown to capture human reasoning better, but not as beneficial for natural language inference. We propose a novel framework, LIREx, that incorporates both a rationale-enabled explanation generator and an instance selector to select only relevant NLEs.
arXiv Detail & Related papers (2020-12-16T18:49:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.