Related papers: Instruction Pre-Training: Language Models are Supervised Multitask Learners

Instruction Pre-Training: Language Models are Supervised Multitask Learners

URL: http://arxiv.org/abs/2406.14491v1
Date: Thu, 20 Jun 2024 16:55:33 GMT
Title: Instruction Pre-Training: Language Models are Supervised Multitask Learners
Authors: Daixuan Cheng, Yuxian Gu, Shaohan Huang, Junyu Bi, Minlie Huang, Furu Wei,
Abstract summary: In this paper, we propose a framework that augments massive raw corpora with instruction-response pairs to pre-train language models (LMs) In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training.
Score: 115.95022434390181
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards better generalization. In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train LMs. The instruction-response pairs are generated by an efficient instruction synthesizer built on open-source models. In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training. In pre-training from scratch, Instruction Pre-Training not only consistently enhances pre-trained base models but also benefits more from further instruction tuning. In continual pre-training, Instruction Pre-Training enables Llama3-8B to be comparable to or even outperform Llama3-70B. Our model, code, and data are available at https://github.com/microsoft/LMOps.

Related papers

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale [56.03294218908334]
Large language models (LLMs) are typically pre-trained via a self-supervised "predict the next word" objective.<n>To make the model useful to users, it is further trained on a far smaller amount of "instruction-tuning" data comprised of supervised training examples of instructions and responses.<n>We propose a procedure that can transform the knowledge in internet-scale pre-training documents into billions of synthetic instruction and answer training pairs.
arXiv Detail & Related papers (2026-01-29T18:58:47Z)
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs [4.096028601599825]
Large Language Models (LLMs) for public use require continuous pre-training to remain up-to-date with the latest data. This study aims to find the most compute-efficient strategy to gain up-to-date knowledge and instruction-following capabilities without requiring any instruction data and fine-tuning.
arXiv Detail & Related papers (2024-10-14T17:20:30Z)
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition [10.36399200974439]
We introduce a novel method combining multi-modal and multi-task unsupervised pre-training with a translation-based supervised mid-training approach. We empirically demonstrate that such a multi-stage approach leads to relative word error rate (WER) improvements of up to 38.45% over baselines on both Librispeech and SUPERB.
arXiv Detail & Related papers (2024-03-28T20:23:39Z)
VILA: On Pre-training for Visual Language Models [74.08039416548209]
We study the design options for VLM pre-training through step-by-step controllable comparisons. We build VILA, a Visual Language model family that consistently outperforms the state-of-the-art models.
arXiv Detail & Related papers (2023-12-12T18:58:18Z)
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling [28.380226726781082]
We propose SPRINT, a scalable offline policy pre-training approach. Our method uses two core ideas to automatically expand a base set of pre-training tasks. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks.
arXiv Detail & Related papers (2023-06-20T20:59:10Z)
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks. We find that their performances are sub-optimal or even lag far behind the single-task baseline. We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
Reinforcement Learning with Action-Free Pre-Training from Videos [95.25074614579646]
We introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos. Our framework significantly improves both final performances and sample-efficiency of vision-based reinforcement learning.
arXiv Detail & Related papers (2022-03-25T19:44:09Z)
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks. Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark. We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z)
Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.