KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales
- URL: http://arxiv.org/abs/2212.09721v2
- Date: Mon, 22 May 2023 00:19:21 GMT
- Title: KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales
- Authors: Aaron Chan, Zhiyuan Zeng, Wyatt Lake, Brihi Joshi, Hanjie Chen, Xiang
Ren
- Abstract summary: We propose KNIFE, which shows that reasoning knowledge can be effectively distilled from FTRs into a small (1B) LM.
KNIFE finetunes a teacher LM (given task input and FTR) to predict the task output, transferring reasoning knowledge from the FTRs to the teacher's hidden states.
Second, KNIFE finetunes a student LM (given task input only) such that its hidden states are aligned with the teacher's.
- Score: 31.28256104334867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models (LMs) have yielded impressive results on many language
reasoning tasks, but their unexpected errors raise doubts about their reasoning
abilities. In light of this, there is growing interest in finetuning/prompting
LMs with both task instances and their associated free-text rationales (FTRs),
which explain the correct reasoning process for predicting the correct task
output (i.e., how to be "right for the right reasons"). However, existing
finetuning methods fail to improve LM performance, while prompting needs
prohibitively large (i.e., >50B) LMs to work well. We propose KNIFE, which
shows that reasoning knowledge can be effectively distilled from FTRs into a
small (i.e., <1B) LM and improve the LM's performance. First, KNIFE finetunes a
teacher LM (given task input and FTR) to predict the task output, transferring
reasoning knowledge from the FTRs to the teacher's hidden states. Second, KNIFE
finetunes a student LM (given task input only) such that its hidden states are
aligned with the teacher's. Thus, the student is endowed with reasoning
knowledge but can be used for inference without direct FTR input. On two
question-answering datasets, KNIFE outperforms various finetuning and prompting
baselines in fully-supervised and low-resource settings. Also, we observe that
FTR quality is crucial to KNIFE's performance.
Related papers
- Instruction Tuning With Loss Over Instructions [42.9106826952674]
Instruction Modelling (IM) trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part.
We show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks and open-ended generation benchmarks.
Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%.
arXiv Detail & Related papers (2024-05-23T10:12:03Z) - Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought [51.240387516059535]
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., 1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks.
We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals.
arXiv Detail & Related papers (2024-04-04T12:46:37Z) - Democratizing Reasoning Ability: Tailored Learning from Large Language
Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs.
We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm.
To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - SCOTT: Self-Consistent Chain-of-Thought Distillation [68.40232422158569]
Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting.
We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger.
To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
arXiv Detail & Related papers (2023-05-03T03:47:00Z) - When Not to Trust Language Models: Investigating Effectiveness of
Parametric and Non-Parametric Memories [58.3421305091187]
This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge.
We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail.
We devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary.
arXiv Detail & Related papers (2022-12-20T18:30:15Z) - PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales [42.98229290301891]
PINTO is a pipeline that rationalizes via prompt-based learning and learns to faithfully reason over rationales via counterfactual regularization.
We show that PINTO significantly improves the ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets.
arXiv Detail & Related papers (2022-11-03T02:55:54Z) - RuleBert: Teaching Soft Rules to Pre-trained Language Models [21.69870624809201]
We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis.
We propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task.
Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training.
arXiv Detail & Related papers (2021-09-24T16:19:25Z) - oLMpics -- On what Language Model Pre-training Captures [84.60594612120173]
We propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition.
A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
arXiv Detail & Related papers (2019-12-31T12:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.