Related papers: Instruction Tuning with Human Curriculum

Instruction Tuning with Human Curriculum

URL: http://arxiv.org/abs/2310.09518v4
Date: Sun, 16 Jun 2024 22:46:38 GMT
Title: Instruction Tuning with Human Curriculum
Authors: Bruce W. Lee, Hyunsoo Cho, Kang Min Yoo,
Abstract summary: We introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework. Our generation pipeline is systematically structured to emulate the sequential and orderly characteristic of human learning. We describe a methodology for generating instruction-response datasets that extensively span the various stages of human education.
Score: 15.025867460765559
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we (1) introduce Curriculum Instruction Tuning, (2) explore the potential advantages of employing diverse curriculum strategies, and (3) delineate a synthetic instruction-response generation framework that complements our theoretical approach. Distinct from the existing instruction tuning dataset, our generation pipeline is systematically structured to emulate the sequential and orderly characteristic of human learning. Additionally, we describe a methodology for generating instruction-response datasets that extensively span the various stages of human education, from middle school through the graduate level, utilizing educational subject catalogs. Before training, we meticulously organize the instruction data to ensure that questions escalate in difficulty regarding (A) the subject matter and (B) the intricacy of the instructions. The findings of our study reveal that substantial improvements in performance can be achieved through the mere application of curriculum ordering to instruction data (achieving gains of +4.76 on TruthfulQA, +2.98 on MMLU, +2.8 on OpenbookQA, and +1.28 on ARC-hard) compared to random shuffling. This enhancement is achieved without incurring additional computational expenses. Through comprehensive experimentation, we observe that the advantages of our proposed method are consistently evident across nine benchmarks.

Related papers

Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing [54.44838681588145]
ExRec is a framework for personalized exercise recommendation with semantically-grounded knowledge tracing.<n>We show that ExRec generalizes robustly to new, unseen questions and that it produces interpretable student learning trajectories.
arXiv Detail & Related papers (2025-07-15T07:54:04Z)
GraphRAG-Induced Dual Knowledge Structure Graphs for Personalized Learning Path Recommendation [56.37740554448673]
We introduce a knowledge concept structure graph generation module EDU-GraphRAG.<n>We then propose a Discrimination Learning-driven Reinforcement Learning (DLRL) module, which mitigates the issue of blocked learning paths.<n>We conduct extensive experiments on three benchmark datasets, demonstrating that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-06-27T15:15:42Z)
Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models [37.67925131391676]
We propose a novel approach to automatically generate open-ended instructions retrospectively from previously collected agent trajectories.<n>Our core idea is to employ LLMs to relabel unsuccessful trajectories by identifying meaningful subtasks the agent has implicitly accomplished.<n>We empirically evaluate our proposed method in the challenging Craftax environment, demonstrating clear improvements in sample efficiency, instruction coverage, and overall policy performance.
arXiv Detail & Related papers (2025-06-24T23:49:28Z)
Pay More Attention to the Robustness of Prompt for Instruction Data Mining [15.350709684929116]
This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance.
arXiv Detail & Related papers (2025-03-31T12:53:08Z)
Fine-Tuning Large Language Models for Educational Support: Leveraging Gagne's Nine Events of Instruction for Lesson Planning [5.022835754140817]
This study investigates how large language models (LLMs) can enhance teacher preparation by incorporating them with Gagne's Nine Events of Instruction. The research starts with creating a comprehensive dataset based on math curriculum standards and Gagne's instructional events. The second method uses specialized datasets to fine-tune open-source models, enhancing their educational content generation and analysis capabilities.
arXiv Detail & Related papers (2025-03-12T11:22:13Z)
Aligning Instruction Tuning with Pre-training [81.4748965653345]
We propose Aligning Instruction Tuning with Pre-training (AITP) to align instruction tuning with pre-training distributions. We show consistent performance improvements with AITP on three fully open large language models (LLMs) across eight benchmarks.
arXiv Detail & Related papers (2025-01-16T08:27:40Z)
Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts. With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS) Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements. High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner. We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases. Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs [4.096028601599825]
Large Language Models (LLMs) for public use require continuous pre-training to remain up-to-date with the latest data. This study aims to find the most compute-efficient strategy to gain up-to-date knowledge and instruction-following capabilities without requiring any instruction data and fine-tuning.
arXiv Detail & Related papers (2024-10-14T17:20:30Z)
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search [25.108044778194536]
We introduce IDEA-MCTS (Instruction Data Enhancement using Monte Carlo Tree Search), a scalable framework for efficiently synthesizing instructions. With tree search and evaluation models, it can efficiently guide each instruction to evolve into a high-quality form, aiding in instruction fine-tuning. Experimental results show that IDEA-MCTS significantly enhances the seed instruction data, raising the average evaluation scores of quality, diversity, and complexity from 2.19 to 3.81.
arXiv Detail & Related papers (2024-10-14T11:28:30Z)
Instruction Tuning With Loss Over Instructions [42.9106826952674]
Instruction Modelling (IM) trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. We show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks and open-ended generation benchmarks. Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%.
arXiv Detail & Related papers (2024-05-23T10:12:03Z)
SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection [51.99159169107426]
We present our novel systems developed for the SemEval-2024 hallucination detection task. Our investigation spans a range of strategies to compare model predictions with reference standards. We introduce three distinct methods that exhibit strong performance metrics.
arXiv Detail & Related papers (2024-04-09T09:03:44Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets. We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z)
Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice. HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics. Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z)
CITING: Large Language Models Create Curriculum for Instruction Tuning [35.66902011221179]
We exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors.
arXiv Detail & Related papers (2023-10-04T01:58:34Z)
Learning Action Conditions from Instructional Manuals for Instruction Understanding [48.52663250368341]
We propose a task dubbed action condition inference, and collecting a high-quality, human annotated dataset of preconditions and postconditions of actions in instructional manuals. We propose a weakly supervised approach to automatically construct large-scale training instances from online instructional manuals, and curate a densely human-annotated and validated dataset to study how well the current NLP models can infer action-condition dependencies in instruction texts.
arXiv Detail & Related papers (2022-05-25T00:19:59Z)
Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP) Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations. We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.