Related papers: Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

URL: http://arxiv.org/abs/2410.14208v1
Date: Fri, 18 Oct 2024 06:50:15 GMT
Title: Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Authors: Xiaochuan Li, Zichun Yu, Chenyan Xiong,
Abstract summary: We propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model's learning process. Experiments show that Montessori-Instruct significantly outperforms standard synthesis methods by 18.35% and 46.24% relatively.
Score: 18.5518735004289
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Synthetic data has been widely used to train large language models, but their generative nature inevitably introduces noisy, non-informative, and misleading learning signals. In this paper, we propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model's learning process. Specifically, we utilize local data influence of synthetic training data points on students to characterize students' learning preferences. Then, we train the teacher model with Direct Preference Optimization (DPO) to generate synthetic data tailored toward student learning preferences. Experiments with Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and MT-Bench demonstrate that Montessori-Instruct significantly outperforms standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also beats data synthesized by a stronger teacher model, GPT-4o. Further analysis confirms the benefits of teacher's learning to generate more influential training data in the student's improved learning, the advantages of local data influence in accurately measuring student preferences, and the robustness of Montessori-Instruct across different student models. Our code and data are open-sourced at https://github.com/cxcscmu/Montessori-Instruct.

Related papers

Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation [47.814833568523255]
PerSyn operates under a new Route then Generate'' paradigm to create data tailored to each student model.<n>Experiments across different model families and scales demonstrate that PerSyn consistently achieves superior or comparable performance.
arXiv Detail & Related papers (2025-10-13T02:36:36Z)
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback [62.235925602004535]
We introduce DataEnvGym, a testbed of teacher environments for data generation agents. DataEnvGym frames data generation as a sequential decision-making task. Agent's goal is to improve student performance. We support 3 diverse tasks (math, code, and VQA) and test multiple students and teachers.
arXiv Detail & Related papers (2024-10-08T17:20:37Z)
Is Child-Directed Speech Effective Training Data for Language Models? [34.46268640655943]
We train GPT-2 and RoBERTa models on 29M words of English child-directed speech. We test whether the global developmental ordering or the local discourse ordering of children's training data supports high performance relative to other datasets. These findings support the hypothesis that, rather than proceeding from better data, the child's learning algorithm is substantially more data-efficient than current language modeling techniques.
arXiv Detail & Related papers (2024-08-07T08:18:51Z)
AgentInstruct: Toward Generative Teaching with Agentic Flows [12.192372792525726]
We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model. We introduce AgentInstruct, an agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. We demonstrate the utility of AgentInstruct by creating a post training dataset of 25M pairs to teach language models different skills, such as text editing, creative writing, tool usage, coding, reading comprehension, etc.
arXiv Detail & Related papers (2024-07-03T21:01:12Z)
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT. AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs. Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z)
YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework. It emulates the teacher-student education process to improve the efficacy of model fine-tuning. Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z)
Customizing Synthetic Data for Data-Free Student Learning [6.8080936803807734]
DFKD aims to obtain a lightweight student model without original training data. To more effectively train the student model, synthetic data shall be customized to the current student learning ability. We propose Customizing Synthetic Data for Data-Free Student Learning (CSD) in this paper.
arXiv Detail & Related papers (2023-07-10T13:17:29Z)
Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework. We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z)
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation. We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z)
Learning by Teaching, with Application to Neural Architecture Search [10.426533624387305]
We propose a novel ML framework referred to as learning by teaching (LBT) In LBT, a teacher model improves itself by teaching a student model to learn well. Based on how the student performs on a validation dataset, the teacher re-learns its model and re-teaches the student until the student achieves great validation performance.
arXiv Detail & Related papers (2021-03-11T23:50:38Z)
SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
Role-Wise Data Augmentation for Knowledge Distillation [48.115719640111394]
Knowledge Distillation (KD) is a common method for transferring the knowledge'' learned by one machine learning model into another. We design data augmentation agents with distinct roles to facilitate knowledge distillation. We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student.
arXiv Detail & Related papers (2020-04-19T14:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.