Instruction Tuning with Human Curriculum
- URL: http://arxiv.org/abs/2310.09518v4
- Date: Sun, 16 Jun 2024 22:46:38 GMT
- Title: Instruction Tuning with Human Curriculum
- Authors: Bruce W. Lee, Hyunsoo Cho, Kang Min Yoo,
- Abstract summary: We introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework.
Our generation pipeline is systematically structured to emulate the sequential and orderly characteristic of human learning.
We describe a methodology for generating instruction-response datasets that extensively span the various stages of human education.
- Score: 15.025867460765559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we (1) introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework that complements our theoretical approach. Distinct from the existing instruction tuning dataset, our generation pipeline is systematically structured to emulate the sequential and orderly characteristic of human learning. Additionally, we describe a methodology for generating instruction-response datasets that extensively span the various stages of human education, from middle school through the graduate level, utilizing educational subject catalogs. Before training, we meticulously organize the instruction data to ensure that questions escalate in difficulty regarding (A) the subject matter and (B) the intricacy of the instructions. The findings of our study reveal that substantial improvements in performance can be achieved through the mere application of curriculum ordering to instruction data (achieving gains of +4.76 on TruthfulQA, +2.98 on MMLU, +2.8 on OpenbookQA, and +1.28 on ARC-hard) compared to random shuffling. This enhancement is achieved without incurring additional computational expenses. Through comprehensive experimentation, we observe that the advantages of our proposed method are consistently evident across nine benchmarks.
Related papers
- Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs [4.096028601599825]
Large Language Models (LLMs) for public use require continuous pre-training to remain up-to-date with the latest data.
This study aims to find the most compute-efficient strategy to gain up-to-date knowledge and instruction-following capabilities without requiring any instruction data and fine-tuning.
arXiv Detail & Related papers (2024-10-14T17:20:30Z) - Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search [25.108044778194536]
We introduce IDEA-MCTS (Instruction Data Enhancement using Monte Carlo Tree Search), a scalable framework for efficiently synthesizing instructions.
With tree search and evaluation models, it can efficiently guide each instruction to evolve into a high-quality form, aiding in instruction fine-tuning.
Experimental results show that IDEA-MCTS significantly enhances the seed instruction data, raising the average evaluation scores of quality, diversity, and complexity from 2.19 to 3.81.
arXiv Detail & Related papers (2024-10-14T11:28:30Z) - Instruction Tuning With Loss Over Instructions [42.9106826952674]
Instruction Modelling (IM) trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part.
We show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks and open-ended generation benchmarks.
Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%.
arXiv Detail & Related papers (2024-05-23T10:12:03Z) - SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection [51.99159169107426]
We present our novel systems developed for the SemEval-2024 hallucination detection task.
Our investigation spans a range of strategies to compare model predictions with reference standards.
We introduce three distinct methods that exhibit strong performance metrics.
arXiv Detail & Related papers (2024-04-09T09:03:44Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - CITING: Large Language Models Create Curriculum for Instruction Tuning [35.66902011221179]
We exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs.
Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors.
arXiv Detail & Related papers (2023-10-04T01:58:34Z) - Learning Action Conditions from Instructional Manuals for Instruction Understanding [48.52663250368341]
We propose a task dubbed action condition inference, and collecting a high-quality, human annotated dataset of preconditions and postconditions of actions in instructional manuals.
We propose a weakly supervised approach to automatically construct large-scale training instances from online instructional manuals, and curate a densely human-annotated and validated dataset to study how well the current NLP models can infer action-condition dependencies in instruction texts.
arXiv Detail & Related papers (2022-05-25T00:19:59Z) - Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.