WizardLM: Empowering Large Language Models to Follow Complex
Instructions
- URL: http://arxiv.org/abs/2304.12244v2
- Date: Sat, 10 Jun 2023 13:18:25 GMT
- Title: WizardLM: Empowering Large Language Models to Follow Complex
Instructions
- Authors: Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng,
Chongyang Tao, Daxin Jiang
- Abstract summary: We show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans.
We use our proposed Evol-Instruct to rewrite instructions step by step into more complex instructions.
Then, we mix all generated instruction data to fine-tune LLaMA.
- Score: 67.41048242052258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training large language models (LLMs) with open-domain instruction following
data brings colossal success. However, manually creating such instruction data
is very time-consuming and labor-intensive. Moreover, humans may struggle to
produce high-complexity instructions. In this paper, we show an avenue for
creating large amounts of instruction data with varying levels of complexity
using LLM instead of humans. Starting with an initial set of instructions, we
use our proposed Evol-Instruct to rewrite them step by step into more complex
instructions. Then, we mix all generated instruction data to fine-tune LLaMA.
We call the resulting model WizardLM. Human evaluations on a
complexity-balanced test bed and Vicuna's testset show that instructions from
Evol-Instruct are superior to human-created ones. By analyzing the human
evaluation results of the high complexity part, we demonstrate that outputs
from our WizardLM are preferred to outputs from OpenAI ChatGPT. In GPT-4
automatic evaluation, WizardLM achieves more than 90\% capacity of ChatGPT on
17 out of 29 skills. Even though WizardLM still lags behind ChatGPT in some
aspects, our findings suggest that fine-tuning with AI-evolved instructions is
a promising direction for enhancing LLMs. Our code and data are public at
https://github.com/nlpxucan/WizardLM
Related papers
- Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning [12.651588927599441]
We introduce Task-Aware Curriculum Planning for Instruction Refinement (TAPIR)
TAPIR is a multi-round distillation framework with balanced task distributions and dynamic difficulty adjustment.
We rigorously evaluate TAPIR using two widely recognized benchmarks, including AlpacaEval 2.0 and MT-Bench.
arXiv Detail & Related papers (2024-05-22T08:38:26Z) - CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities.
We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution.
We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z) - Ensemble-Instruct: Generating Instruction-Tuning Data with a
Heterogeneous Mixture of LMs [23.38507910115345]
In-context learning (ICL) techniques can train strong conversational agents with only a small amount of human supervision.
Here we explore the application of such techniques to language models that are much smaller (around 10B--40B parameters) and have permissive licenses.
We find the Self-Instruct approach to be less effective at these sizes and propose new ICL methods that draw on two main ideas.
arXiv Detail & Related papers (2023-10-21T10:21:17Z) - Tuna: Instruction Tuning using Feedback from Large Language Models [74.04950416204551]
We propose finetuning an instruction-tuned large language model using our novel textitprobabilistic ranking and textitcontextual ranking approaches.
Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM.
On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs.
arXiv Detail & Related papers (2023-10-20T09:55:06Z) - Auto-Instruct: Automatic Instruction Generation and Ranking for
Black-Box Language Models [91.02730155418699]
Large language models (LLMs) can perform a wide range of tasks by following natural language instructions.
We introduce Auto-Instruct, a novel method to automatically improve the quality of instructions provided to LLMs.
In experiments on 118 out-of-domain tasks, Auto-Instruct surpasses both human-written instructions and existing baselines of LLM-generated instructions.
arXiv Detail & Related papers (2023-10-19T19:52:55Z) - Ada-Instruct: Adapting Instruction Generators for Complex Reasoning [17.07852413707166]
We introduce Ada-Instruct, an adaptive instruction generator developed by fine-tuning open-source LLMs.
We empirically validated Ada-Instruct's efficacy across different applications, including code completion, mathematical reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-10-06T13:28:04Z) - WizardCoder: Empowering Code Large Language Models with Evol-Instruct [67.24653703564492]
We introduce WizardCoder, which empowers Code LLMs with complex instruction fine-tuning.
Our model surpasses all other open-source Code LLMs by a substantial margin.
arXiv Detail & Related papers (2023-06-14T15:18:48Z) - InstructZero: Efficient Instruction Optimization for Black-Box Large
Language Models [117.92988284226765]
Large language models(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations.
We optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM.
Our results show that InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks.
arXiv Detail & Related papers (2023-06-05T17:55:22Z) - LongForm: Effective Instruction Tuning with Reverse Instructions [43.7029933201002]
We introduce the LongForm-C dataset, which is created by reverse instructions.
First we select a diverse set of human-written documents from corpora such as C4 and Wikipedia.
We generate instructions via LLMs for human-written corpus examples using reverse instructions.
arXiv Detail & Related papers (2023-04-17T17:36:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.