CITING: Large Language Models Create Curriculum for Instruction Tuning
- URL: http://arxiv.org/abs/2310.02527v1
- Date: Wed, 4 Oct 2023 01:58:34 GMT
- Title: CITING: Large Language Models Create Curriculum for Instruction Tuning
- Authors: Tao Feng, Zifeng Wang, Jimeng Sun
- Abstract summary: We exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs.
Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors.
- Score: 35.66902011221179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent advancement of large language models (LLMs) has been achieved
through a combo of instruction tuning and human alignment. However, building
manually crafted instruction datasets and performing human alignment become the
bottleneck for scaling the development of LLMs. In this paper, we exploit the
idea of leveraging AI models in lieu of humans as the teacher to train student
LLMs. Our method is inspired by how human students refine their writing skills
by following the rubrics and learning from the revisions offered by their
tutors. Specifically, we employ a teacher LLM to create a curriculum for
instruction tuning of the student LLM, namely Curriculum Instruction TunING
(CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics
for evaluating the answers corresponding to various types of questions, and (2)
the student LLM learns to follow the rubrics and perform self-correction from
the revision made by the teacher. We further iteratively carry out it to embody
the procedure of CITING. We compare CITING to a series of state-of-the-art
baselines on four datasets. Our method demonstrates strong improvement in terms
of articulate, in-depth, and comprehensive by GPT-4 evaluation. Specifically,
it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1%
over RRHF, and 76.3% over RAFT, respectively.
Related papers
- Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning [12.651588927599441]
Instruction tuning aims to align large language models with open-domain instructions and human-preferred responses.
We introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR) to select instructions that are difficult for a student LLM to follow.
To balance the student's capabilities, task distributions in training sets are adjusted with responses automatically refined according to their corresponding tasks.
arXiv Detail & Related papers (2024-05-22T08:38:26Z) - CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning [32.54921739100195]
We propose CoachLM, a novel approach to enhance the quality of instruction datasets through automatic revisions on samples in the dataset.
CoachLM is trained from the samples revised by human experts and significantly increases the proportion of high-quality samples in the dataset from 17.7% to 78.9%.
Results show that CoachLM improves the instruction-following capabilities of the instruction-tuned LLM by an average of 29.9%.
arXiv Detail & Related papers (2023-11-22T09:04:57Z) - Tuna: Instruction Tuning using Feedback from Large Language Models [74.04950416204551]
We propose finetuning an instruction-tuned large language model using our novel textitprobabilistic ranking and textitcontextual ranking approaches.
Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM.
On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs.
arXiv Detail & Related papers (2023-10-20T09:55:06Z) - Auto-Instruct: Automatic Instruction Generation and Ranking for
Black-Box Language Models [91.02730155418699]
Large language models (LLMs) can perform a wide range of tasks by following natural language instructions.
We introduce Auto-Instruct, a novel method to automatically improve the quality of instructions provided to LLMs.
In experiments on 118 out-of-domain tasks, Auto-Instruct surpasses both human-written instructions and existing baselines of LLM-generated instructions.
arXiv Detail & Related papers (2023-10-19T19:52:55Z) - Instruction Tuning with Human Curriculum [15.025867460765559]
We introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework.
Our generation pipeline is systematically structured to emulate the sequential and orderly characteristic of human learning.
We describe a methodology for generating instruction-response datasets that extensively span the various stages of human education.
arXiv Detail & Related papers (2023-10-14T07:16:08Z) - Evaluating Large Language Models at Evaluating Instruction Following [54.49567482594617]
We introduce a challenging meta-evaluation benchmark, LLMBar, designed to test the ability of an LLM evaluator in discerning instruction-following outputs.
We discover that different evaluators exhibit distinct performance on LLMBar and even the highest-scoring ones have substantial room for improvement.
arXiv Detail & Related papers (2023-10-11T16:38:11Z) - Efficient Finetuning Large Language Models For Vietnamese Chatbot [1.2075778142867704]
Large language models (LLMs) have been shown to achieve remarkable performance across a variety of natural language tasks.
We leverage large-scale instruction-following datasets from open-source projects, namely Alpaca, GPT4All, and Chat-Doctor.
We utilize parameter-efficient tuning through Low-Rank Adaptation (LoRA) on two open LLMs, resulting four models: Bloomz-Chat, Bloomz-Doctor, GPTJ-Chat, GPTJ-Doctor.
arXiv Detail & Related papers (2023-09-09T00:11:53Z) - Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks.
Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information.
This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z) - Language Model Self-improvement by Reinforcement Learning Contemplation [13.152789365858812]
This paper introduces a novel unsupervised method called LanguageModel Self-Improvement by Reinforcement Learning Contemplation (SIRLC)
As a student, the model generates answers to unlabeled questions, while as a teacher, it evaluates the generated text and assigns scores accordingly.
We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.
arXiv Detail & Related papers (2023-05-23T19:25:52Z) - Self-Refine: Iterative Refinement with Self-Feedback [62.78755306241981]
Self-Refine is an approach for improving initial outputs from large language models (LLMs) through iterative feedback and refinement.
We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3.5, ChatGPT, and GPT-4) LLMs.
Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test time using our simple, standalone approach.
arXiv Detail & Related papers (2023-03-30T18:30:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.