Heuristic-Free Multi-Teacher Learning
- URL: http://arxiv.org/abs/2411.12724v1
- Date: Tue, 19 Nov 2024 18:45:16 GMT
- Title: Heuristic-Free Multi-Teacher Learning
- Authors: Huy Thong Nguyen, En-Hung Chu, Lenord Melvix, Jazon Jiao, Chunglin Wen, Benjamin Louie,
- Abstract summary: Teacher2Task is a novel framework for multi-teacher learning that eliminates the need for manual aggregations.
Instead of relying on aggregated labels, the framework transforms the training data, consisting of ground truth labels and annotations from N teachers, into N+1 distinct tasks.
- Score: 0.6597195879147557
- License:
- Abstract: We introduce Teacher2Task, a novel framework for multi-teacher learning that eliminates the need for manual aggregation heuristics. Existing multi-teacher methods typically rely on such heuristics to combine predictions from multiple teachers, often resulting in sub-optimal aggregated labels and the propagation of aggregation errors. Teacher2Task addresses these limitations by introducing teacher-specific input tokens and reformulating the training process. Instead of relying on aggregated labels, the framework transforms the training data, consisting of ground truth labels and annotations from N teachers, into N+1 distinct tasks: N auxiliary tasks that predict the labeling styles of the N individual teachers, and one primary task that focuses on the ground truth labels. This approach, drawing upon principles from multiple learning paradigms, demonstrates strong empirical results across a range of architectures, modalities, and tasks.
Related papers
- Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels [27.53399899573121]
We propose an RL-based framework combining the exploration ability of reinforcement learning and the exploitation ability of supervised learning.
Experimental results across various tasks, including document-level relation extraction, demonstrate the generalization and effectiveness of our framework.
arXiv Detail & Related papers (2024-06-24T03:36:19Z) - Joint-Task Regularization for Partially Labeled Multi-Task Learning [30.823282043129552]
Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets.
We propose Joint-Task Regularization (JTR), an intuitive technique which leverages cross-task relations to simultaneously regularize all tasks in a single joint-task latent space.
arXiv Detail & Related papers (2024-04-02T14:16:59Z) - Nonparametric Teaching for Multiple Learners [20.75580803325611]
We introduce a novel framework -- Multi-learner Nonparametric Teaching (MINT)
MINT aims to instruct multiple learners, with each learner focusing on learning a scalar-valued target model.
We demonstrate that MINT offers significant teaching speed-up over repeated single-learner teaching.
arXiv Detail & Related papers (2023-11-17T04:04:11Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - Active Teacher for Semi-Supervised Object Detection [80.10937030195228]
We propose a novel algorithm called Active Teacher for semi-supervised object detection (SSOD)
Active Teacher extends the teacher-student framework to an iterative version, where the label set is partially and gradually augmented by evaluating three key factors of unlabeled examples.
With this design, Active Teacher can maximize the effect of limited label information while improving the quality of pseudo-labels.
arXiv Detail & Related papers (2023-03-15T03:59:27Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Multi-Task Self-Training for Learning General Representations [97.01728635294879]
Multi-task self-training (MuST) harnesses the knowledge in independent specialized teacher models to train a single general student model.
MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets.
arXiv Detail & Related papers (2021-08-25T17:20:50Z) - Representation Consolidation for Training Expert Students [54.90754502493968]
We show that a multi-head, multi-task distillation method is sufficient to consolidate representations from task-specific teacher(s) and improve downstream performance.
Our method can also combine the representational knowledge of multiple teachers trained on one or multiple domains into a single model.
arXiv Detail & Related papers (2021-07-16T17:58:18Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z) - Adaptive Multi-Teacher Multi-level Knowledge Distillation [11.722728148523366]
We propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework(AMTML-KD)
It consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights.
As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD.
arXiv Detail & Related papers (2021-03-06T08:18:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.