L2T-DLN: Learning to Teach with Dynamic Loss Network
- URL: http://arxiv.org/abs/2310.19313v1
- Date: Mon, 30 Oct 2023 07:21:40 GMT
- Title: L2T-DLN: Learning to Teach with Dynamic Loss Network
- Authors: Zhoyang Hai, Liyuan Pan, Xiabi Liu, Zhengzheng Liu, Mirna Yunita
- Abstract summary: In existing works, the teacher iteration model 1) merely determines the loss function based on the present states of the student model.
In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units.
Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model.
- Score: 4.243592852049963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the concept of teaching being introduced to the machine learning
community, a teacher model start using dynamic loss functions to teach the
training of a student model. The dynamic intends to set adaptive loss functions
to different phases of student model learning. In existing works, the teacher
model 1) merely determines the loss function based on the present states of the
student model, i.e., disregards the experience of the teacher; 2) only utilizes
the states of the student model, e.g., training iteration number and
loss/accuracy from training/validation sets, while ignoring the states of the
loss function. In this paper, we first formulate the loss adjustment as a
temporal task by designing a teacher model with memory units, and, therefore,
enables the student learning to be guided by the experience of the teacher
model. Then, with a dynamic loss network, we can additionally use the states of
the loss to assist the teacher learning in enhancing the interactions between
the teacher and the student model. Extensive experiments demonstrate our
approach can enhance student learning and improve the performance of various
deep models on real-world tasks, including classification, objective detection,
and semantic segmentation scenarios.
Related papers
- CFTS-GAN: Continual Few-Shot Teacher Student for Generative Adversarial Networks [0.5024983453990064]
Few-shot and continual learning face two well-known challenges in GANs: overfitting and catastrophic forgetting.
This paper proposes a Continual Few-shot Teacher-Student technique for the generative adversarial network (CFTS-GAN) that considers both challenges together.
arXiv Detail & Related papers (2024-10-17T20:49:08Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Toward In-Context Teaching: Adapting Examples to Students' Misconceptions [54.82965010592045]
We introduce a suite of models and evaluation methods we call AdapT.
AToM is a new probabilistic model for adaptive teaching that jointly infers students' past beliefs and optimize for the correctness of future beliefs.
Our results highlight both the difficulty of the adaptive teaching task and the potential of learned adaptive models for solving it.
arXiv Detail & Related papers (2024-05-07T17:05:27Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - Periodically Exchange Teacher-Student for Source-Free Object Detection [7.222926042027062]
Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data.
Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model.
We propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model.
arXiv Detail & Related papers (2023-11-23T11:30:54Z) - DriveAdapter: Breaking the Coupling Barrier of Perception and Planning
in End-to-End Autonomous Driving [64.57963116462757]
State-of-the-art methods usually follow the Teacher-Student' paradigm.
Student model only has access to raw sensor data and conducts behavior cloning on the data collected by the teacher model.
We propose DriveAdapter, which employs adapters with the feature alignment objective function between the student (perception) and teacher (planning) modules.
arXiv Detail & Related papers (2023-08-01T09:21:53Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - RLTutor: Reinforcement Learning Based Adaptive Tutoring System by
Modeling Virtual Student with Fewer Interactions [10.34673089426247]
We propose a framework for optimizing teaching strategies by constructing a virtual model of the student.
Our results can serve as a buffer between theoretical instructional optimization and practical applications in e-learning systems.
arXiv Detail & Related papers (2021-07-31T15:42:03Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.