Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review
- URL: http://arxiv.org/abs/2410.03663v2
- Date: Wed, 16 Oct 2024 01:01:58 GMT
- Title: Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review
- Authors: Zhuochun Li, Yuelyu Ji, Rui Meng, Daqing He,
- Abstract summary: We introduce a novel Fault-Aware Distillation via Peer-Review (FAIR) approach.
Instead of merely obtaining gold rationales from teachers, our method asks teachers to identify and explain the student's mistakes.
- Score: 11.756344944226495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While reasoning capabilities typically emerge in large language models (LLMs) with tens of billions of parameters, recent research focuses on improving smaller open-source models through knowledge distillation (KD) from commercial LLMs. However, many of these studies rely solely on responses from a single LLM as the gold rationale, unlike the natural human learning process, which involves understanding both the correct answers and the reasons behind mistakes. In this paper, we introduce a novel Fault-Aware Distillation via Peer-Review (FAIR) approach: 1) Instead of merely obtaining gold rationales from teachers, our method asks teachers to identify and explain the student's mistakes, providing customized instruction learning data. 2) We design a simulated peer-review process between teacher LLMs, which selects only the generated rationales above the acceptance threshold. This reduces the chance of teachers guessing correctly with flawed rationale, improving instructional data quality. Comprehensive experiments and analysis on mathematical, commonsense, and logical reasoning tasks demonstrate the effectiveness of our method.
Related papers
- Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning [21.12539851761666]
Large language models (LLMs) sometimes demonstrate poor performance on knowledge-intensive tasks.
We propose a novel method named eliciting, filtering and integrating knowledge in large language model (LINKED)
With our comprehensive experiments on two complex commonsense reasoning benchmarks, our method outperforms SOTA baselines (up to 9.0% improvement of accuracy)
arXiv Detail & Related papers (2024-10-12T14:12:22Z) - Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all.
LLMs struggle to precisely detect student's errors and tailor their feedback to these errors.
Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models [13.54030164748731]
We propose a general approach called quality-guided contrastive rationale distillation for reasoning capacity learning.
For the learning of positive knowledge, we collect rationales through self-consistency to denoise the LLM rationales generated by temperature sampling.
For the negative knowledge distillation, we generate negative rationales using temperature sampling for the iteration-before smaller language models themselves.
arXiv Detail & Related papers (2024-05-14T13:07:10Z) - Democratizing Reasoning Ability: Tailored Learning from Large Language
Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs.
We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm.
To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z) - SCOTT: Self-Consistent Chain-of-Thought Distillation [68.40232422158569]
Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting.
We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger.
To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
arXiv Detail & Related papers (2023-05-03T03:47:00Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - DE-RRD: A Knowledge Distillation Framework for Recommender System [16.62204445256007]
We propose a knowledge distillation framework for recommender system, called DE-RRD.
It enables the student model to learn from the latent knowledge encoded in the teacher model as well as from the teacher's predictions.
Our experiments show that DE-RRD outperforms the state-of-the-art competitors and achieves comparable or even better performance to that of the teacher model with faster inference time.
arXiv Detail & Related papers (2020-12-08T11:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.