Related papers: Learning by Teaching, with Application to Neural Architecture Search

Learning by Teaching, with Application to Neural Architecture Search

URL: http://arxiv.org/abs/2103.07009v1
Date: Thu, 11 Mar 2021 23:50:38 GMT
Title: Learning by Teaching, with Application to Neural Architecture Search
Authors: Parth Sheth, Yueyu Jiang, Pengtao Xie
Abstract summary: We propose a novel ML framework referred to as learning by teaching (LBT) In LBT, a teacher model improves itself by teaching a student model to learn well. Based on how the student performs on a validation dataset, the teacher re-learns its model and re-teaches the student until the student achieves great validation performance.
Score: 10.426533624387305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In human learning, an effective skill in improving learning outcomes is learning by teaching: a learner deepens his/her understanding of a topic by teaching this topic to others. In this paper, we aim to borrow this teaching-driven learning methodology from humans and leverage it to train more performant machine learning models, by proposing a novel ML framework referred to as learning by teaching (LBT). In the LBT framework, a teacher model improves itself by teaching a student model to learn well. Specifically, the teacher creates a pseudo-labeled dataset and uses it to train a student model. Based on how the student performs on a validation dataset, the teacher re-learns its model and re-teaches the student until the student achieves great validation performance. Our framework is based on three-level optimization which contains three stages: teacher learns; teacher teaches student; teacher re-learns based on how well the student performs. A simple but efficient algorithm is developed to solve the three-level optimization problem. We apply LBT to search neural architectures on CIFAR-10, CIFAR-100, and ImageNet. The efficacy of our method is demonstrated in various experiments.

Related papers

Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT. AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs. Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z)
YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework. It emulates the teacher-student education process to improve the efficacy of model fine-tuning. Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z)
UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation [48.49860868061573]
Recent neural implicit representations (NIRs) have achieved great success in the tasks of 3D reconstruction and novel view synthesis. They require the images of a scene from different camera views to be available for one-time training. This is expensive especially for scenarios with large-scale scenes and limited data storage. We design a student-teacher framework to mitigate the catastrophic problem.
arXiv Detail & Related papers (2022-12-21T11:43:20Z)
Teaching What You Should Teach: A Data-Based Distillation Method [20.595460553747163]
We introduce the "Teaching what you Should Teach" strategy into a knowledge distillation framework. We propose a data-based distillation method named "TST" that searches for desirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses.
arXiv Detail & Related papers (2022-12-11T06:22:14Z)
Iterative Teacher-Aware Learning [136.05341445369265]
In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency. We propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function.
arXiv Detail & Related papers (2021-10-01T00:27:47Z)
Interleaving Learning, with Application to Neural Architecture Search [12.317568257671427]
We propose a novel machine learning framework referred to as interleaving learning (IL) In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion. We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2021-03-12T00:54:22Z)
Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z)
Small-Group Learning, with Application to Neural Architecture Search [17.86826990290058]
In human learning, a small group of students work together towards the same learning objective, where they express their understanding of a topic to their peers, compare their ideas, and help each other to trouble-shoot problems. In this paper, we aim to investigate whether this human learning method can be borrowed to train better machine learning models, by developing a novel ML framework -- small-group learning (SGL) SGL is formulated as a multi-level optimization framework consisting of three learning stages: each learner trains a model independently and uses this model to perform pseudo-labeling; each learner trains another model using datasets pseudo-
arXiv Detail & Related papers (2020-12-23T05:56:47Z)
Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model. Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.