Instruction Tuning with GPT-4
- URL: http://arxiv.org/abs/2304.03277v1
- Date: Thu, 6 Apr 2023 17:58:09 GMT
- Title: Instruction Tuning with GPT-4
- Authors: Baolin Peng and Chunyuan Li and Pengcheng He and Michel Galley and
Jianfeng Gao
- Abstract summary: We present the first attempt to use GPT-4 to generate instruction-following data for finetuning large language models.
Our early experiments on instruction-tuned LLaMA models show that the 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks.
- Score: 107.55078894215798
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Prior work has shown that finetuning large language models (LLMs) using
machine-generated instruction-following data enables such models to achieve
remarkable zero-shot capabilities on new tasks, and no human-written
instructions are needed. In this paper, we present the first attempt to use
GPT-4 to generate instruction-following data for LLM finetuning. Our early
experiments on instruction-tuned LLaMA models show that the 52K English and
Chinese instruction-following data generated by GPT-4 leads to superior
zero-shot performance on new tasks to the instruction-following data generated
by previous state-of-the-art models. We also collect feedback and comparison
data from GPT-4 to enable a comprehensive evaluation and reward model training.
We make our data generated using GPT-4 as well as our codebase publicly
available.
Related papers
- Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data [51.34222224728979]
We propose a novel approach that uses the first half of a random text from OpenWebText as the instruction and GPT-3.5-turbo or GPT-4-turbo to complete the text as the response.
Despite the data being "non-instructional", we found that pre-trained LLMs fine-tuned on this data can gain instruction-following capabilities.
arXiv Detail & Related papers (2024-08-27T01:21:53Z) - RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG)
We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG.
For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z) - Using GPT-4 to Augment Unbalanced Data for Automatic Scoring [0.5586073503694489]
We introduce a novel text data augmentation framework leveraging GPT-4, a generative large language model.
We crafted prompts for GPT-4 to generate responses, especially for minority scoring classes.
We finetuned DistillBERT for automatic scoring based on the augmented and original datasets.
arXiv Detail & Related papers (2023-10-25T01:07:50Z) - Automatic Pair Construction for Contrastive Post-training [57.57149781848383]
In this paper, we propose an automatic way to construct contrastive data for large language models (LLMs)
We compare the contrastive techniques of SLiC and DPO to SFT baselines and find that DPO provides a step-function improvement even after continuing SFT saturates.
We also explore a data curriculum learning scheme for contrastive post-training, which starts by learning from "easier" pairs and transitioning to "harder" ones.
arXiv Detail & Related papers (2023-10-03T17:59:46Z) - InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4 [14.248735997950446]
We introduce InstructionGPT-4, which is fine-tuned on a small dataset comprising only 200 examples.
Based on these metrics, we present an effective and trainable data selector to automatically identify and filter low-quality vision-language data.
Our findings demonstrate that less but high-quality instruction tuning data is efficient in enabling multimodal large language models to generate better output.
arXiv Detail & Related papers (2023-08-23T11:27:30Z) - Visual Instruction Tuning [79.70923292053097]
We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data.
By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant.
When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%.
arXiv Detail & Related papers (2023-04-17T17:59:25Z) - GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.