Characterizing Large Language Models as Rationalizers of
Knowledge-intensive Tasks
- URL: http://arxiv.org/abs/2311.05085v2
- Date: Wed, 31 Jan 2024 19:17:00 GMT
- Title: Characterizing Large Language Models as Rationalizers of
Knowledge-intensive Tasks
- Authors: Aditi Mishra and Sajjadur Rahman and Hannah Kim and Kushan Mitra and
Estevam Hruschka
- Abstract summary: Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision.
We consider the task of generating knowledge-guided rationalization in natural language by using expert-written examples in a few-shot manner.
Surprisingly, crowd-workers preferred knowledge-grounded rationales over crowdsourced rationalizations, citing their factuality, sufficiency, and comprehensive refutations.
- Score: 6.51301154858045
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) are proficient at generating fluent text with
minimal task-specific supervision. Yet, their ability to provide well-grounded
rationalizations for knowledge-intensive tasks remains under-explored. Such
tasks, like commonsense multiple-choice questions, require rationales based on
world knowledge to support predictions and refute alternate options. We
consider the task of generating knowledge-guided rationalization in natural
language by using expert-written examples in a few-shot manner. Surprisingly,
crowd-workers preferred knowledge-grounded rationales over crowdsourced
rationalizations, citing their factuality, sufficiency, and comprehensive
refutations. Although LLMs-generated rationales were preferable, further
improvements in conciseness and novelty are required. In another study, we show
how rationalization of incorrect model predictions erodes humans' trust in
LLM-generated rationales. Motivated by these observations, we create a
two-stage pipeline to review task predictions and eliminate potential incorrect
decisions before rationalization, enabling trustworthy rationale generation.
Related papers
- Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking [4.1017420444369215]
We analyze generated free-text rationales in tasks with subjective answers.
We focus on pairwise argument ranking, a highly subjective task with significant potential for real-world applications.
Our findings suggest that open-source LLMs, particularly Llama2-70B-chat, are capable of providing highly persuasive rationalizations.
arXiv Detail & Related papers (2024-06-20T00:28:33Z) - LaRS: Latent Reasoning Skills for Chain-of-Thought Reasoning [61.7853049843921]
Chain-of-thought (CoT) prompting is a popular in-context learning approach for large language models (LLMs)
This paper introduces a new approach named Latent Reasoning Skills (LaRS) that employs unsupervised learning to create a latent space representation of rationales.
arXiv Detail & Related papers (2023-12-07T20:36:10Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Concise and Organized Perception Facilitates Reasoning in Large Language Models [32.71672086718057]
We show that large language models (LLMs) exhibit failure patterns akin to human-like cognitive biases when dealing with disordered and irrelevant content in reasoning tasks.
We propose a novel reasoning approach named Concise and Organized Perception (COP)
COP carefully analyzes the given statements to identify the most pertinent information while eliminating redundancy efficiently.
arXiv Detail & Related papers (2023-10-05T04:47:49Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - ZARA: Improving Few-Shot Self-Rationalization for Small Language Models [29.755148112827502]
We present a novel approach, Zero-shot Augmentation of Rationale-Answer pairs (ZARA), to automatically construct pseudo-parallel data for self-training.
ZARA achieves SOTA performance on the FEB benchmark, for both the task accuracy and the explanation metric.
arXiv Detail & Related papers (2023-05-12T10:07:12Z) - SCOTT: Self-Consistent Chain-of-Thought Distillation [68.40232422158569]
Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting.
We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger.
To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
arXiv Detail & Related papers (2023-05-03T03:47:00Z) - PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales [42.98229290301891]
PINTO is a pipeline that rationalizes via prompt-based learning and learns to faithfully reason over rationales via counterfactual regularization.
We show that PINTO significantly improves the ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets.
arXiv Detail & Related papers (2022-11-03T02:55:54Z) - Rationale-Augmented Ensembles in Language Models [53.45015291520658]
We reconsider rationale-augmented prompting for few-shot in-context learning.
We identify rationale sampling in the output space as the key component to robustly improve performance.
We demonstrate that rationale-augmented ensembles achieve more accurate and interpretable results than existing prompting approaches.
arXiv Detail & Related papers (2022-07-02T06:20:57Z) - Can Rationalization Improve Robustness? [39.741059642044874]
We investigate whether neural NLP models can provide robustness to adversarial attacks in addition to their interpretable nature.
We generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks.
Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios.
arXiv Detail & Related papers (2022-04-25T17:02:42Z) - Self-training with Few-shot Rationalization: Teacher Explanations Aid
Student in Few-shot NLU [88.8401599172922]
We develop a framework based on self-training language models with limited task-specific labels and rationales.
We show that the neural model performance can be significantly improved by making it aware of its rationalized predictions.
arXiv Detail & Related papers (2021-09-17T00:36:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.