Explicit Knowledge Transfer for Weakly-Supervised Code Generation
- URL: http://arxiv.org/abs/2211.16740v3
- Date: Wed, 7 Jun 2023 18:01:11 GMT
- Title: Explicit Knowledge Transfer for Weakly-Supervised Code Generation
- Authors: Zhangir Azerbayev, Ansong Ni, Hailey Schoelkopf, Dragomir Radev
- Abstract summary: We propose explicit knowledge transfer (EKT) to transfer the code generation ability of an LLM to a smaller model.
EKT uses the few-shot capabilities of a teacher LLM to create NL-code pairs that we then filter for correctness and fine-tune the student on.
We find that EKT not only yields better performance than training with expert iteration, but also outperforms knowledge distillation.
- Score: 14.758396460685017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) can acquire strong code-generation capabilities
through few-shot learning. In contrast, supervised fine-tuning is still needed
for smaller models to achieve good performance. Such fine-tuning demands a
large number of task-specific NL-code pairs, which are expensive to obtain. In
this paper, we attempt to transfer the code generation ability of an LLM to a
smaller model with the aid of weakly-supervised data. More specifically, we
propose explicit knowledge transfer (EKT), which uses the few-shot capabilities
of a teacher LLM to create NL-code pairs that we then filter for correctness
and fine-tune the student on. We evaluate EKT on the task of generating code
solutions to math word problems from the GSM8k dataset. We find that EKT not
only yields better performance than training with expert iteration, but also
outperforms knowledge distillation, another form of knowledge transfer. A
GPT-Neo 1.3B model trained using EKT with a GPT-J teacher achieves a 12.4%
pass@100 on GSM8k, while the same student and teacher trained with knowledge
distillation yield only a 3.7% pass@100. We also show that it is possible for a
student model to outperform the teacher using EKT.
Related papers
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment [74.40196814292426]
We introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework.
GKT uses a larger Large Language Models as a ''teacher'' to create guidance prompts, paired with a smaller ''student'' model to finalize responses.
It achieves a maximum accuracy improvement of 14.18%, along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00 % along with a 7.73 times speed-up in CSQA.
arXiv Detail & Related papers (2024-05-30T02:37:35Z) - Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation.
We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z) - Talking Models: Distill Pre-trained Knowledge to Downstream Models via
Interactive Communication [25.653517213641575]
We develop an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models.
Our design is inspired by the way humans learn from teachers who can explain knowledge in a way that meets the students' needs.
arXiv Detail & Related papers (2023-10-04T22:22:21Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Lion: Adversarial Distillation of Proprietary Large Language Models [16.245052771463044]
We propose a novel adversarial distillation framework for a more efficient knowledge transfer.
We successfully transfer knowledge from ChatGPT to a student model (named Lion) using a mere 70k training data.
arXiv Detail & Related papers (2023-05-22T09:49:16Z) - Can Bad Teaching Induce Forgetting? Unlearning in Deep Networks using an
Incompetent Teacher [6.884272840652062]
We propose a novel machine unlearning method by exploring the utility of competent and incompetent teachers in a student-teacher framework to induce forgetfulness.
The knowledge from the competent and incompetent teachers is selectively transferred to the student to obtain a model that doesn't contain any information about the forget data.
We introduce the zero forgetting (ZRF) metric to evaluate any unlearning method.
arXiv Detail & Related papers (2022-05-17T05:13:17Z) - Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.