Related papers: KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales

KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales

URL: http://arxiv.org/abs/2212.09721v2
Date: Mon, 22 May 2023 00:19:21 GMT
Title: KNIFE: Distilling Reasoning Knowledge From Free-Text Rationales
Authors: Aaron Chan, Zhiyuan Zeng, Wyatt Lake, Brihi Joshi, Hanjie Chen, Xiang Ren
Abstract summary: We propose KNIFE, which shows that reasoning knowledge can be effectively distilled from FTRs into a small (1B) LM. KNIFE finetunes a teacher LM (given task input and FTR) to predict the task output, transferring reasoning knowledge from the FTRs to the teacher's hidden states. Second, KNIFE finetunes a student LM (given task input only) such that its hidden states are aligned with the teacher's.
Score: 31.28256104334867
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models (LMs) have yielded impressive results on many language reasoning tasks, but their unexpected errors raise doubts about their reasoning abilities. In light of this, there is growing interest in finetuning/prompting LMs with both task instances and their associated free-text rationales (FTRs), which explain the correct reasoning process for predicting the correct task output (i.e., how to be "right for the right reasons"). However, existing finetuning methods fail to improve LM performance, while prompting needs prohibitively large (i.e., >50B) LMs to work well. We propose KNIFE, which shows that reasoning knowledge can be effectively distilled from FTRs into a small (i.e., <1B) LM and improve the LM's performance. First, KNIFE finetunes a teacher LM (given task input and FTR) to predict the task output, transferring reasoning knowledge from the FTRs to the teacher's hidden states. Second, KNIFE finetunes a student LM (given task input only) such that its hidden states are aligned with the teacher's. Thus, the student is endowed with reasoning knowledge but can be used for inference without direct FTR input. On two question-answering datasets, KNIFE outperforms various finetuning and prompting baselines in fully-supervised and low-resource settings. Also, we observe that FTR quality is crucial to KNIFE's performance.

Related papers

Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning [79.48839334040197]
Instruction Fine-tuning (IFT) can enhance the helpfulness of Large Language Models (LLMs) IFT steers LLMs to generate responses with long-tail knowledge that is not well covered during pre-training, leading to more informative but less truthful answers when generalizing to unseen tasks. We propose $textbfUNIT$, a novel IFT paradigm to address this trade-off.
arXiv Detail & Related papers (2025-02-17T16:10:30Z)
On Teacher Hacking in Language Model Distillation [61.19867259475047]
We investigate whether a similar phenomenon, that we call teacher hacking, can occur during knowledge distillation. This could arise because the teacher LM is itself an imperfect approximation of the true distribution. Online data generation techniques effectively mitigates teacher hacking.
arXiv Detail & Related papers (2025-02-04T19:26:28Z)
Instruction Tuning With Loss Over Instructions [42.9106826952674]
Instruction Modelling (IM) trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. We show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks and open-ended generation benchmarks. Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%.
arXiv Detail & Related papers (2024-05-23T10:12:03Z)
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought [51.240387516059535]
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., 1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals.
arXiv Detail & Related papers (2024-04-04T12:46:37Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback. Our focus is the code generation task, where the model produces code based on natural language instructions. LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z)
SCOTT: Self-Consistent Chain-of-Thought Distillation [68.40232422158569]
Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting. We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger. To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
arXiv Detail & Related papers (2023-05-03T03:47:00Z)
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories [58.3421305091187]
This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail. We devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary.
arXiv Detail & Related papers (2022-12-20T18:30:15Z)
PINTO: Faithful Language Reasoning Using Prompt-Generated Rationales [42.98229290301891]
PINTO is a pipeline that rationalizes via prompt-based learning and learns to faithfully reason over rationales via counterfactual regularization. We show that PINTO significantly improves the ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets.
arXiv Detail & Related papers (2022-11-03T02:55:54Z)
RuleBert: Teaching Soft Rules to Pre-trained Language Models [21.69870624809201]
We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis. We propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task. Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training.
arXiv Detail & Related papers (2021-09-24T16:19:25Z)
oLMpics -- On what Language Model Pre-training Captures [84.60594612120173]
We propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition. A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
arXiv Detail & Related papers (2019-12-31T12:11:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.