Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
- URL: http://arxiv.org/abs/2402.10110v2
- Date: Fri, 7 Jun 2024 20:23:21 GMT
- Title: Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
- Authors: Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou,
- Abstract summary: Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned.
This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality.
This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning.
- Score: 39.73918872205541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data. This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning and LLMs of superior performance. Selective Reflection-Tuning is a data augmentation and synthesis that generally improves LLM finetuning and self-improvement without collecting brand-new data. We apply our method to Alpaca and WizardLM data and achieve much stronger and top-tier 7B and 13B LLMs.
Related papers
- Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets.
The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method.
The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z) - IterSelectTune: An Iterative Training Framework for Efficient Instruction-Tuning Data Selection [28.581257601441045]
We introduce $textbfIterSelectTune$, an efficient, cost-effective iterative training policy for selecting high-quality instruction data.
By fine-tuning on approximately 20% of the source data, our method consistently outperforms models fine-tuned on the full dataset.
arXiv Detail & Related papers (2024-10-17T11:48:57Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Rethinking the Instruction Quality: LIFT is What You Need [20.829372251475476]
Existing quality improvement methods alter instruction data through dataset expansion or curation.
We propose LIFT (LLM Instruction Fusion Transfer), a novel and versatile paradigm designed to elevate the instruction quality to new heights.
Experimental results demonstrate that, even with a limited quantity of high-quality instruction data selected by our paradigm, LLMs consistently uphold robust performance across various tasks.
arXiv Detail & Related papers (2023-12-12T03:30:21Z) - CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning [32.54921739100195]
We propose CoachLM, a novel approach to enhance the quality of instruction datasets through automatic revisions on samples in the dataset.
CoachLM is trained from the samples revised by human experts and significantly increases the proportion of high-quality samples in the dataset from 17.7% to 78.9%.
Results show that CoachLM improves the instruction-following capabilities of the instruction-tuned LLM by an average of 29.9%.
arXiv Detail & Related papers (2023-11-22T09:04:57Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning [79.32236399694077]
Low-quality data in the training set are usually detrimental to instruction tuning.
We propose a novel method, termed "reflection-tuning"
This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data.
arXiv Detail & Related papers (2023-10-18T05:13:47Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.