Iterative Teaching by Data Hallucination
- URL: http://arxiv.org/abs/2210.17467v2
- Date: Wed, 12 Apr 2023 20:49:44 GMT
- Title: Iterative Teaching by Data Hallucination
- Authors: Zeju Qiu, Weiyang Liu, Tim Z. Xiao, Zhen Liu, Umang Bhatt, Yucen Luo,
Adrian Weller, Bernhard Sch\"olkopf
- Abstract summary: We consider the problem of iterative machine teaching, where a teacher sequentially provides examples based on the status of a learner.
We propose data hallucination teaching (DHT) where the teacher can generate input data intelligently based on labels, the learner's status and the target concept.
- Score: 37.246902903546896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of iterative machine teaching, where a teacher
sequentially provides examples based on the status of a learner under a
discrete input space (i.e., a pool of finite samples), which greatly limits the
teacher's capability. To address this issue, we study iterative teaching under
a continuous input space where the input example (i.e., image) can be either
generated by solving an optimization problem or drawn directly from a
continuous distribution. Specifically, we propose data hallucination teaching
(DHT) where the teacher can generate input data intelligently based on labels,
the learner's status and the target concept. We study a number of challenging
teaching setups (e.g., linear/neural learners in omniscient and black-box
settings). Extensive empirical results verify the effectiveness of DHT.
Related papers
- When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets? [0.0]
We present our submission to the BabyLM challenge, aiming to push the boundaries of data-efficient language model pretraining.
We address the limitation of treating students equally by formulating weighted mutual learning as a bi-level optimization problem.
Our evaluations show that teacher-less methods can match or surpass teacher-supervised approaches.
arXiv Detail & Related papers (2024-11-25T15:25:31Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - DriveAdapter: Breaking the Coupling Barrier of Perception and Planning
in End-to-End Autonomous Driving [64.57963116462757]
State-of-the-art methods usually follow the Teacher-Student' paradigm.
Student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model.
We propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules.
arXiv Detail & Related papers (2023-08-01T09:21:53Z) - Active Teacher for Semi-Supervised Object Detection [80.10937030195228]
We propose a novel algorithm called Active Teacher for semi-supervised object detection (SSOD)
Active Teacher extends the teacher-student framework to an iterative version, where the label set is partially and gradually augmented by evaluating three key factors of unlabeled examples.
With this design, Active Teacher can maximize the effect of limited label information while improving the quality of pseudo-labels.
arXiv Detail & Related papers (2023-03-15T03:59:27Z) - Random Teachers are Good Teachers [19.74244993871716]
We investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.
When distilling a student into such a random teacher, we observe a strong improvement of the distilled student over its teacher in terms of probing accuracy.
arXiv Detail & Related papers (2023-02-23T15:26:08Z) - Improved knowledge distillation by utilizing backward pass knowledge in
neural networks [17.437510399431606]
Knowledge distillation (KD) is one of the prominent techniques for model compression.
In this work, we generate new auxiliary training samples based on extracting knowledge from the backward pass of the teacher.
We show how this technique can be used successfully in applications of natural language processing (NLP) and language understanding.
arXiv Detail & Related papers (2023-01-27T22:07:38Z) - Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework.
We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally.
To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Data-Efficient Ranking Distillation for Image Retrieval [15.88955427198763]
Recent approaches tackle this issue using knowledge distillation to transfer knowledge from a deeper and heavier architecture to a much smaller network.
In this paper we address knowledge distillation for metric learning problems.
Unlike previous approaches, our proposed method jointly addresses the following constraints i) limited queries to teacher model, ii) black box teacher model with access to the final output representation, andiii) small fraction of original training data without any ground-truth labels.
arXiv Detail & Related papers (2020-07-10T10:59:16Z) - Role-Wise Data Augmentation for Knowledge Distillation [48.115719640111394]
Knowledge Distillation (KD) is a common method for transferring the knowledge'' learned by one machine learning model into another.
We design data augmentation agents with distinct roles to facilitate knowledge distillation.
We find empirically that specially tailored data points enable the teacher's knowledge to be demonstrated more effectively to the student.
arXiv Detail & Related papers (2020-04-19T14:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.