TPD: Enhancing Student Language Model Reasoning via Principle Discovery
and Guidance
- URL: http://arxiv.org/abs/2401.13849v1
- Date: Wed, 24 Jan 2024 23:11:33 GMT
- Title: TPD: Enhancing Student Language Model Reasoning via Principle Discovery
and Guidance
- Authors: Haorui Wang (1), Rongzhi Zhang (1), Yinghao Li (1), Lingkai Kong (1),
Yuchen Zhuang (1), Xiusi Chen (2), Chao Zhang (1) ((1) College of Computing,
Georgia Institute of Technology, (2) Department of Computer Science,
University of California, Los Angeles)
- Abstract summary: We introduce a principle-based teacher-student framework called Teaching via Principle Discovery'' (TPD)
Inspired by human learning mechanisms, TPD mimics the interaction between a teacher and a student using a principle-based approach.
TPD significantly improves the student model's performance, achieving $6.2%$ improvement on average.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have recently showcased remarkable reasoning
abilities. However, larger models often surpass their smaller counterparts in
reasoning tasks, posing the challenge of effectively transferring these
capabilities from larger models. Existing approaches heavily rely on extensive
fine-tuning data or continuous interactions with a superior teacher LLM during
inference. We introduce a principle-based teacher-student framework called
``Teaching via Principle Discovery'' (TPD) to address these limitations.
Inspired by human learning mechanisms, TPD mimics the interaction between a
teacher and a student using a principle-based approach. The teacher LLM
generates problem-solving instructions and corrective principles based on the
student LLM's errors. These principles guide the refinement of instructions and
the selection of instructive examples from a validation set. This enables the
student model to learn from both the teacher's guidance and its own mistakes.
Once the student model begins making inferences, TPD requires no further
intervention from the teacher LLM or humans. Through extensive experiments
across eight reasoning tasks, we demonstrate the effectiveness of TPD. Compared
to standard chain-of-thought prompting, TPD significantly improves the student
model's performance, achieving $6.2\%$ improvement on average.
Related papers
- SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights [89.56181323849512]
We propose SuperCorrect, a framework that uses a large teacher model to supervise and correct both the reasoning and reflection processes of a smaller student model.
In the first stage, we extract hierarchical high-level and detailed thought templates from the teacher model to guide the student model in eliciting more fine-grained reasoning thoughts.
In the second stage, we introduce cross-model collaborative direct preference optimization (DPO) to enhance the self-correction abilities of the student model.
arXiv Detail & Related papers (2024-10-11T17:25:52Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Retrieved In-Context Principles from Previous Mistakes [55.109234526031884]
In-context learning (ICL) has been instrumental in adapting Large Language Models (LLMs) to downstream tasks using correct input-output examples.
Recent advances have attempted to improve model performance through principles derived from mistakes.
We propose Retrieved In-Context Principles (RICP), a novel teacher-student framework.
arXiv Detail & Related papers (2024-07-08T07:32:26Z) - PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs [47.35598271306371]
Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings.
Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models.
We present PLaD, a novel preference-based LLM distillation framework.
arXiv Detail & Related papers (2024-06-05T03:08:25Z) - Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models [39.82130327284791]
Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities and versatility in NLP tasks.
They sometimes fail to maintain crucial invariances for specific tasks.
This paper addresses this inefficiency at inference time.
arXiv Detail & Related papers (2024-03-20T13:38:07Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.