Related papers: Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

URL: http://arxiv.org/abs/2401.13986v1
Date: Thu, 25 Jan 2024 07:04:30 GMT
Title: Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
Authors: Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao
Abstract summary: Large language models (LLMs) often generate convincing, fluent explanations. They often generate inconsistent explanations on different inputs. We propose explanation-consistency finetuning (EC-finetuning) to generate consistent natural-language explanations.
Score: 66.87754065127714
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at https://github.com/yandachen/explanation-consistency-finetuning .

Related papers

Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations [87.68633031231924]
Post-hoc explanation methods provide interpretation by attributing predictions to input features. Do these explanations unintentionally reverse the natural relationship between inputs and outputs? We propose Inversion Quantification (IQ), a framework that quantifies the degree to which explanations rely on outputs and deviate from faithful input-output relationships.
arXiv Detail & Related papers (2025-04-11T19:00:12Z)
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment [82.99849359892112]
We re-examine previously reported reductions in response diversity post-alignment. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation. Findings indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior.
arXiv Detail & Related papers (2024-06-25T16:32:33Z)
Can Language Models Explain Their Own Classification Behavior? [1.8177391253202122]
Large language models (LLMs) perform well at a myriad of tasks, but explaining the processes behind this performance is a challenge. This paper investigates whether LLMs can give faithful high-level explanations of their own internal processes. We release our dataset, ArticulateRules, which can be used to test self-explanation for LLMs trained either in-context or by finetuning.
arXiv Detail & Related papers (2024-05-13T02:31:08Z)
FaithLM: Towards Faithful Explanations for Large Language Models [67.29893340289779]
Large Language Models (LLMs) have become proficient in addressing complex tasks by leveraging their internal knowledge and reasoning capabilities. The black-box nature of these models complicates the task of explaining their decision-making processes. We introduce FaithLM to explain the decision of LLMs with natural language (NL) explanations.
arXiv Detail & Related papers (2024-02-07T09:09:14Z)
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations [62.61495090463084]
Large language models (LLMs) are trained to imitate humans to explain human decisions. We evaluate whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals. We found that LLM's explanations have low precision and that precision does not correlate with plausibility.
arXiv Detail & Related papers (2023-07-17T17:41:47Z)
Explanation-based Finetuning Makes Models More Robust to Spurious Cues [21.327036110196637]
Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task. We propose explanation-based finetuning as a general approach to mitigate LLMs' reliance on spurious correlations. We finetune the model to additionally generate a free-text explanation supporting its answer.
arXiv Detail & Related papers (2023-05-08T18:53:45Z)
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z)
ExaRanker: Explanation-Augmented Neural Ranker [67.4894325619275]
In this work, we show that neural rankers also benefit from explanations. We use LLMs such as GPT-3.5 to augment retrieval datasets with explanations. Our model, dubbed ExaRanker, finetuned on a few thousand examples with synthetic explanations performs on par with models finetuned on 3x more examples without explanations.
arXiv Detail & Related papers (2023-01-25T11:03:04Z)
Improving Neural Model Performance through Natural Language Feedback on Their Explanations [38.96890526935312]
We introduce MERCURIE - an interactive system that refines its explanations for a given reasoning task by getting human feedback in natural language. Our approach generates graphs that 40% have fewer inconsistencies as compared with the off-the-shelf system.
arXiv Detail & Related papers (2021-04-18T08:10:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.