Learning by Teaching, with Application to Neural Architecture Search
- URL: http://arxiv.org/abs/2103.07009v1
- Date: Thu, 11 Mar 2021 23:50:38 GMT
- Title: Learning by Teaching, with Application to Neural Architecture Search
- Authors: Parth Sheth, Yueyu Jiang, Pengtao Xie
- Abstract summary: We propose a novel ML framework referred to as learning by teaching (LBT)
In LBT, a teacher model improves itself by teaching a student model to learn well.
Based on how the student performs on a validation dataset, the teacher re-learns its model and re-teaches the student until the student achieves great validation performance.
- Score: 10.426533624387305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In human learning, an effective skill in improving learning outcomes is
learning by teaching: a learner deepens his/her understanding of a topic by
teaching this topic to others. In this paper, we aim to borrow this
teaching-driven learning methodology from humans and leverage it to train more
performant machine learning models, by proposing a novel ML framework referred
to as learning by teaching (LBT). In the LBT framework, a teacher model
improves itself by teaching a student model to learn well. Specifically, the
teacher creates a pseudo-labeled dataset and uses it to train a student model.
Based on how the student performs on a validation dataset, the teacher
re-learns its model and re-teaches the student until the student achieves great
validation performance. Our framework is based on three-level optimization
which contains three stages: teacher learns; teacher teaches student; teacher
re-learns based on how well the student performs. A simple but efficient
algorithm is developed to solve the three-level optimization problem. We apply
LBT to search neural architectures on CIFAR-10, CIFAR-100, and ImageNet. The
efficacy of our method is demonstrated in various experiments.
Related papers
- Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT.
AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs.
Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation [48.49860868061573]
Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis.
They require the images of a scene from different camera views to be available for one-time training.
This is expensive especially for scenarios with large-scale scenes and limited data storage.
We design a student-teacher framework to mitigate the catastrophic problem.
arXiv Detail & Related papers (2022-12-21T11:43:20Z) - Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework.
We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally.
To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z) - Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency.
We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z) - Interleaving Learning, with Application to Neural Architecture Search [12.317568257671427]
We propose a novel machine learning framework referred to as interleaving learning (IL)
In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion.
We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2021-03-12T00:54:22Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Small-Group Learning, with Application to Neural Architecture Search [17.86826990290058]
In human learning, a small group of students work together towards the same learning objective, where they express their understanding of a topic to their peers, compare their ideas, and help each other to trouble-shoot problems.
In this paper, we aim to investigate whether this human learning method can be borrowed to train better machine learning models, by developing a novel ML framework -- small-group learning (SGL)
SGL is formulated as a multi-level optimization framework consisting of three learning stages: each learner trains a model independently and uses this model to perform pseudo-labeling; each learner trains another model using datasets pseudo-
arXiv Detail & Related papers (2020-12-23T05:56:47Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.