Related papers: AlpaGasus: Training A Better Alpaca with Fewer Data

AlpaGasus: Training A Better Alpaca with Fewer Data

URL: http://arxiv.org/abs/2307.08701v5
Date: Tue, 13 Feb 2024 18:37:25 GMT
Title: AlpaGasus: Training A Better Alpaca with Fewer Data
Authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin
Abstract summary: We propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data. We introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca on multiple test sets and the controlled human evaluation.
Score: 93.6949102689243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and filters out low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes. Moreover, the experiments prove the efficacy of our method across diverse datasets, base models, and LLM filters. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: https://lichang-chen.github.io/AlpaGasus/

Related papers

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs [56.74916151916208]
Large language models (LLMs) exhibit hallucinations (i.e., unfaithful or nonsensical information) when serving as AI assistants in various domains. Previous factuality alignment methods that conduct response-level preference learning inevitably introduced noises during training. This paper proposes a fine-grained factuality alignment method based on Direct Preference Optimization (DPO), called Mask-DPO.
arXiv Detail & Related papers (2025-03-04T18:20:24Z)
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [56.24431208419858]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset. We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z)
Improving Pretraining Data Using Perplexity Correlations [56.41097718862742]
We present a framework that selects high-quality pretraining data without any LLM training of our own. We build a new statistical framework for data selection centered around estimates of perplexity-benchmark correlations. Our approach outperforms DSIR on every benchmark, while matching the best data selector found in DataComp-LM.
arXiv Detail & Related papers (2024-09-09T17:23:29Z)
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing [48.07915731998946]
We present a self-synthesis method for generating large-scale alignment data named Magpie. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct.
arXiv Detail & Related papers (2024-06-12T17:52:30Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Text Quality-Based Pruning for Efficient Training of Language Models [66.66259229732121]
We propose a novel method for numerically evaluating text quality in large unlabelled NLP datasets. By proposing the text quality metric, the paper establishes a framework to identify and eliminate low-quality text instances. Experimental results over multiple models and datasets demonstrate the efficacy of this approach.
arXiv Detail & Related papers (2024-04-26T18:01:25Z)
Automated Data Curation for Robust Language Model Fine-Tuning [13.8454385440986]
We introduce an automated data curation pipeline CLEAR for instruction tuning datasets. CLEAR estimates which training data is low-quality and either filters or corrects it. Experiments reveal that CLEAR consistently improves the performance of fine-tuned models across many datasets and models.
arXiv Detail & Related papers (2024-03-19T14:44:45Z)
Reformatted Alignment [27.79684742862816]
Current methods to improve data quality are either labor-intensive or prone to factual errors caused by hallucinations. This paper introduces a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence. Experimentally, ReAlign significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs.
arXiv Detail & Related papers (2024-02-19T15:21:58Z)
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [38.29072578390376]
We show that fine-tuning on the longest responses should be the default baseline for any work on instruction fine-tuning. We demonstrate this for several LLMs (Llama-2-7B, Llama-2-13B, Mistral-7B-v0.1) and datasets (Alpaca-52k, Evol-Instruct-70k)
arXiv Detail & Related papers (2024-02-07T13:32:11Z)
Aligner: Efficient Alignment by Learning to Correct [10.056049435141645]
We introduce Aligner, a model-agnostic, plug-and-play module that learns the correctional residuals between preferred and dispreferred answers. It can be applied to various open-source and API-based models with only one-off training, making it suitable for rapid iteration. Our experiments demonstrate performance improvements by deploying the same Aligner model across 11 different language models.
arXiv Detail & Related papers (2024-02-04T09:24:51Z)
Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences. We formulate each task as a sequence-to-sequence problem and perform multi-task training. We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.