Related papers: Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

URL: http://arxiv.org/abs/2408.14774v3
Date: Tue, 03 Dec 2024 20:01:23 GMT
Title: Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Authors: Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora,
Abstract summary: Instruct-SkillMix is an automated approach for creating diverse, high quality SFT data.<n>LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0.<n>Instruct-SkillMix pipeline is flexible and is adaptable to other settings.
Score: 40.61149906522525
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data. The Instruct-SkillMix pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core "skills" for instruction-following, either from existing datasets, or by directly prompting the model; (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just $4$K examples, LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0. To our knowledge, this achieves state-of-the-art performance among all models that have only undergone SFT (no RL methods) and competes with proprietary models such as Claude 3 Opus and LLaMA-3.1-405B-Instruct. Ablation studies also suggest plausible reasons for why creating open instruction-tuning datasets via naive crowd-sourcing has proved difficult. Introducing low quality answers ("shirkers") in $20\%$ of Instruct-SkillMix examples causes performance to plummet, sometimes catastrophically. The Instruct-SkillMix pipeline is flexible and is adaptable to other settings.

Related papers

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources [36.525767435183845]
We introduce Open-Qwen2VL, a fully open-source 2B- parameter Multimodal Large Language Model pre-trained efficiently on 29M image-text pairs. The training was conducted on academic level 8xA100-40G at on 5B packed multimodal tokens, which is 0.36% of 1.4T multimodal pre-training tokens of Qwen2-VL. The final instruction-tuned Open-Qwen2VL outperforms partially-open state-of-the-art MLLM Qwen2-VL-2B on various multimodal benchmarks.
arXiv Detail & Related papers (2025-04-01T09:54:00Z)
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.<n>Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z)
Entropy Law: The Story Behind Data Compression and LLM Performance [115.70395740286422]
We find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss. Based on the findings of the entropy law, we propose a quite efficient and universal data selection method. We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training.
arXiv Detail & Related papers (2024-07-09T08:14:29Z)
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct [43.7550233177368]
This paper explores whether it is possible to use a fine-tuned open-source model to generate additional data to augment its instruction-tuning dataset.<n>We propose Inverse-Instruct, a data augmentation technique that uses a fine-tuned LLM to generate additional instructions of code responses from its own training dataset.
arXiv Detail & Related papers (2024-07-08T08:00:05Z)
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning [17.73193523921637]
Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream applications. This paper introduces PAFT, a new PArallel training paradigm for effective LLM Fine-Tuning.
arXiv Detail & Related papers (2024-06-25T20:11:37Z)
MAmmoTH2: Scaling Instructions from the Web [39.786198452175505]
We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus. We build MAmmoTH2 models, which significantly boost performance on reasoning benchmarks. Further training MAmmoTH2 on public instruction tuning datasets yields MAmmoTH2-Plus, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-05-06T15:11:38Z)
Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent. We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements. Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z)
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [38.29072578390376]
We show that fine-tuning on the longest responses should be the default baseline for any work on instruction fine-tuning. We demonstrate this for several LLMs (Llama-2-7B, Llama-2-13B, Mistral-7B-v0.1) and datasets (Alpaca-52k, Evol-Instruct-70k)
arXiv Detail & Related papers (2024-02-07T13:32:11Z)
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN) At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
AlpaGasus: Training A Better Alpaca with Fewer Data [93.6949102689243]
We propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data. We introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca on multiple test sets and the controlled human evaluation.
arXiv Detail & Related papers (2023-07-17T17:59:40Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.