SCOTT: Self-Consistent Chain-of-Thought Distillation
- URL: http://arxiv.org/abs/2305.01879v4
- Date: Wed, 30 Aug 2023 21:28:01 GMT
- Title: SCOTT: Self-Consistent Chain-of-Thought Distillation
- Authors: Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin and Xiang
Ren
- Abstract summary: Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting.
We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger.
To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
- Score: 68.40232422158569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LMs) beyond a certain scale, demonstrate the emergent
capability of generating free-text rationales for their predictions via
chain-of-thought (CoT) prompting. While CoT can yield dramatically improved
performance, such gains are only observed for sufficiently large LMs. Even more
concerning, there is little guarantee that the generated rationales are
consistent with LM's predictions or faithfully justify the decisions. In this
work, we propose a faithful knowledge distillation method to learn a small,
self-consistent CoT model from a teacher model that is orders of magnitude
larger. To form better supervision, we elicit rationales supporting the gold
answers from a large LM (teacher) by contrastive decoding, which encourages the
teacher to generate tokens that become more plausible only when the answer is
considered. To ensure faithful distillation, we use the teacher-generated
rationales to learn a student LM with a counterfactual reasoning objective,
which prevents the student from ignoring the rationales to make inconsistent
predictions. Experiments show that, while yielding comparable end-task
performance, our method can generate CoT rationales that are more faithful than
baselines do. Further analysis suggests that such a model respects the
rationales more when making decisions; thus, we can improve its performance
more by refining its rationales.
Related papers
- Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness.
We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z) - Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review [11.756344944226495]
We introduce a novel Fault-Aware Distillation via Peer-Review (FAIR) approach.
Instead of merely obtaining gold rationales from teachers, our method asks teachers to identify and explain the student's mistakes.
arXiv Detail & Related papers (2024-10-04T17:59:41Z) - Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought [51.240387516059535]
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., 1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks.
We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals.
arXiv Detail & Related papers (2024-04-04T12:46:37Z) - Characterizing Large Language Models as Rationalizers of
Knowledge-intensive Tasks [6.51301154858045]
Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision.
We consider the task of generating knowledge-guided rationalization in natural language by using expert-written examples in a few-shot manner.
Surprisingly, crowd-workers preferred knowledge-grounded rationales over crowdsourced rationalizations, citing their factuality, sufficiency, and comprehensive refutations.
arXiv Detail & Related papers (2023-11-09T01:04:44Z) - Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step [133.60124577507727]
Chain-of-thought prompting primes large language models to verbalize rationalization for their predictions.
We show that orders-of-magnitude smaller models can still benefit from chain-of-thought prompting.
We introduce Chain-of-Thought Distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model.
arXiv Detail & Related papers (2023-06-24T20:15:07Z) - ZARA: Improving Few-Shot Self-Rationalization for Small Language Models [29.755148112827502]
We present a novel approach, Zero-shot Augmentation of Rationale-Answer pairs (ZARA), to automatically construct pseudo-parallel data for self-training.
ZARA achieves SOTA performance on the FEB benchmark, for both the task accuracy and the explanation metric.
arXiv Detail & Related papers (2023-05-12T10:07:12Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.