Properties and Challenges of LLM-Generated Explanations
- URL: http://arxiv.org/abs/2402.10532v1
- Date: Fri, 16 Feb 2024 09:37:54 GMT
- Title: Properties and Challenges of LLM-Generated Explanations
- Authors: Jenny Kunz, Marco Kuhlmann
- Abstract summary: We study the self-rationalising capabilities of large language models (LLMs)
We find that generated explanations show selectivity and contain illustrative elements, but less frequently are subjective or misleading.
In particular, we outline positive and negative implications depending on the goals and user groups of the self-rationalising system.
- Score: 3.257973235065581
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The self-rationalising capabilities of large language models (LLMs) have been
explored in restricted settings, using task/specific data sets. However,
current LLMs do not (only) rely on specifically annotated data; nonetheless,
they frequently explain their outputs. The properties of the generated
explanations are influenced by the pre-training corpus and by the target data
used for instruction fine-tuning. As the pre-training corpus includes a large
amount of human-written explanations "in the wild", we hypothesise that LLMs
adopt common properties of human explanations. By analysing the outputs for a
multi-domain instruction fine-tuning data set, we find that generated
explanations show selectivity and contain illustrative elements, but less
frequently are subjective or misleading. We discuss reasons and consequences of
the properties' presence or absence. In particular, we outline positive and
negative implications depending on the goals and user groups of the
self-rationalising system.
Related papers
- Can adversarial attacks by large language models be attributed? [1.3812010983144802]
Attributing outputs from Large Language Models in adversarial settings presents significant challenges that are likely to grow in importance.
We investigate this attribution problem using formal language theory, specifically language identification in the limit as introduced by Gold and extended by Angluin.
Our results show that due to the non-identifiability of certain language classes it is theoretically impossible to attribute outputs to specific LLMs with certainty.
arXiv Detail & Related papers (2024-11-12T18:28:57Z) - XForecast: Evaluating Natural Language Explanations for Time Series Forecasting [72.57427992446698]
Time series forecasting aids decision-making, especially for stakeholders who rely on accurate predictions.
Traditional explainable AI (XAI) methods, which underline feature or temporal importance, often require expert knowledge.
evaluating forecast NLEs is difficult due to the complex causal relationships in time series data.
arXiv Detail & Related papers (2024-10-18T05:16:39Z) - Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models [59.970391602080205]
This study investigates whether such constraints on generation space impact LLMs abilities, including reasoning and domain knowledge comprehension.
We evaluate LLMs performance when restricted to adhere to structured formats versus generating free-form responses across various common tasks.
We find that stricter format constraints generally lead to greater performance degradation in reasoning tasks.
arXiv Detail & Related papers (2024-08-05T13:08:24Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Learning to Generate Explainable Stock Predictions using Self-Reflective
Large Language Models [54.21695754082441]
We propose a framework to teach Large Language Models (LLMs) to generate explainable stock predictions.
A reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations.
Our framework can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient.
arXiv Detail & Related papers (2024-02-06T03:18:58Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Investigating the Effect of Natural Language Explanations on
Out-of-Distribution Generalization in Few-shot NLI [11.44224857047629]
We formulate a few-shot learning setup and examine the effects of natural language explanations on OOD generalization.
We leverage the templates in the HANS dataset and construct templated natural language explanations for each template.
We show that generated explanations show competitive BLEU scores against groundtruth explanations, but they fail to improve prediction performance.
arXiv Detail & Related papers (2021-10-12T18:00:02Z) - LIREx: Augmenting Language Inference with Relevant Explanation [1.4780878458667916]
Natural language explanations (NLEs) are a form of data annotation in which annotators identify rationales when assigning labels to data instances.
NLEs have been shown to capture human reasoning better, but not as beneficial for natural language inference.
We propose a novel framework, LIREx, that incorporates both a rationale-enabled explanation generator and an instance selector to select only relevant NLEs.
arXiv Detail & Related papers (2020-12-16T18:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.