TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
- URL: http://arxiv.org/abs/2310.19019v3
- Date: Tue, 16 Jul 2024 01:02:01 GMT
- Title: TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
- Authors: Nan He, Hanyu Lai, Chenyang Zhao, Zirui Cheng, Junting Pan, Ruoyu Qin, Ruofan Lu, Rui Lu, Yunchen Zhang, Gangming Zhao, Zhaohui Hou, Zhiyuan Huang, Shaoqing Lu, Ding Liang, Mingjie Zhan,
- Abstract summary: We propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples.
The model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters.
We will release the TeacherLM series of models and augmented datasets as open-source.
- Score: 27.90035459143466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting. The experimental results indicate that the data augmentation provided by TeacherLM has brought significant benefits. We will release the TeacherLM series of models and augmented datasets as open-source.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - AgentInstruct: Toward Generative Teaching with Agentic Flows [12.192372792525726]
We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model.
We introduce AgentInstruct, an agentic framework for automatically creating large amounts of diverse and high-quality synthetic data.
We demonstrate the utility of AgentInstruct by creating a post training dataset of 25M pairs to teach language models different skills, such as text editing, creative writing, tool usage, coding, reading comprehension, etc.
arXiv Detail & Related papers (2024-07-03T21:01:12Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Can Instruction Fine-Tuned Language Models Identify Social Bias through
Prompting? [0.0]
We present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting.
Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%.
arXiv Detail & Related papers (2023-07-19T22:03:40Z) - MiniLLM: Knowledge Distillation of Large Language Models [112.93051247165089]
Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs)
We propose a KD approach that distills LLMs into smaller language models.
Our method is scalable for different model families with 120M to 13B parameters.
arXiv Detail & Related papers (2023-06-14T14:44:03Z) - Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes [91.58845026796149]
We introduce Distilling step-by-step, a new mechanism that trains small models that outperform large language models.
We present three findings across 4 NLP benchmarks.
arXiv Detail & Related papers (2023-05-03T17:50:56Z) - Learn What Is Possible, Then Choose What Is Best: Disentangling
One-To-Many Relations in Language Through Text-based Games [3.615981646205045]
We present an approach to train language models that can emulate the desirable behaviours, but not the undesirable ones.
Using text-based games as a testbed, our approach, PASA, uses discrete latent variables to capture the range of different behaviours.
Results show up to 49% empirical improvement over the previous state-of-the-art model.
arXiv Detail & Related papers (2023-04-14T17:11:26Z) - Large Language Models Are Reasoning Teachers [9.290757451344673]
Fine-tune-CoT is a method that generates reasoning samples from very large teacher models to fine-tune smaller models.
We find that Fine-tune-CoT enables substantial reasoning capability in small models, far outperforming prompt-based baselines and even the teacher model in many tasks.
arXiv Detail & Related papers (2022-12-20T08:24:45Z) - Teaching Small Language Models to Reason [19.625523231233128]
Chain of thought prompting successfully improves the reasoning capabilities of large language models.
We explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation.
Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets.
arXiv Detail & Related papers (2022-12-16T11:24:42Z) - Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models.
Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency.
We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z) - One Teacher is Enough? Pre-trained Language Model Distillation from
Multiple Teachers [54.146208195806636]
We propose a multi-teacher knowledge distillation framework named MT-BERT for pre-trained language model compression.
We show that MT-BERT can train high-quality student model from multiple teacher PLMs.
Experiments on three benchmark datasets validate the effectiveness of MT-BERT in compressing PLMs.
arXiv Detail & Related papers (2021-06-02T08:42:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.