JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset
Student-Teacher Scenario for Video Action Recognition
- URL: http://arxiv.org/abs/2308.04934v1
- Date: Wed, 9 Aug 2023 13:09:07 GMT
- Title: JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset
Student-Teacher Scenario for Video Action Recognition
- Authors: Lucian Bicsi, Bogdan Alexe, Radu Tudor Ionescu, Marius Leordeanu
- Abstract summary: We propose JEDI, a multi-dataset semi-supervised learning method.
It efficiently combines knowledge from multiple experts, learned on different datasets, to train and improve the performance of individual, per dataset, student models.
- Score: 29.67402932890899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose JEDI, a multi-dataset semi-supervised learning method, which
efficiently combines knowledge from multiple experts, learned on different
datasets, to train and improve the performance of individual, per dataset,
student models. Our approach achieves this by addressing two important problems
in current machine learning research: generalization across datasets and
limitations of supervised training due to scarcity of labeled data. We start
with an arbitrary number of experts, pretrained on their own specific dataset,
which form the initial set of student models. The teachers are immediately
derived by concatenating the feature representations from the penultimate
layers of the students. We then train all models in a student-teacher
semi-supervised learning scenario until convergence. In our efficient approach,
student-teacher training is carried out jointly and end-to-end, showing that
both students and teachers improve their generalization capacity during
training. We validate our approach on four video action recognition datasets.
By simultaneously considering all datasets within a unified semi-supervised
setting, we demonstrate significant improvements over the initial experts.
Related papers
- Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets [22.341604423831733]
We propose a teacher-student framework to distill knowledge from multiple teachers trained on distinct datasets.
We employ a multi-level feature distillation procedure to transfer the knowledge to a student model for each of the considered datasets.
We show that our novel Multi-Level Feature Distillation (MLFD) can significantly surpass equivalent architectures that are either trained on individual datasets, or jointly trained on all datasets at once.
arXiv Detail & Related papers (2024-10-29T16:23:20Z) - Less is More: High-value Data Selection for Visual Instruction Tuning [127.38740043393527]
We propose a high-value data selection approach TIVE, to eliminate redundancy within the visual instruction data and reduce the training cost.
Our approach using only about 15% data can achieve comparable average performance to the full-data fine-tuned model across eight benchmarks.
arXiv Detail & Related papers (2024-03-14T16:47:25Z) - Self-Training and Multi-Task Learning for Limited Data: Evaluation Study
on Object Detection [4.9914667450658925]
Experimental results show the improvement of performance when using a weak teacher with unseen data for training a multi-task student.
Despite the limited setup we believe the experimental results show the potential of multi-task knowledge distillation and self-training.
arXiv Detail & Related papers (2023-09-12T14:50:14Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Multi-Task Self-Training for Learning General Representations [97.01728635294879]
Multi-task self-training (MuST) harnesses the knowledge in independent specialized teacher models to train a single general student model.
MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets.
arXiv Detail & Related papers (2021-08-25T17:20:50Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.