DMT: Comprehensive Distillation with Multiple Self-supervised Teachers
- URL: http://arxiv.org/abs/2312.11938v1
- Date: Tue, 19 Dec 2023 08:31:30 GMT
- Title: DMT: Comprehensive Distillation with Multiple Self-supervised Teachers
- Authors: Yuang Liu, Jing Wang, Qiang Zhou, Fan Wang, Jun Wang, Wei Zhang
- Abstract summary: We introduce Comprehensive Distillation with Multiple Self-supervised Teachers (DMT) for pretrained model compression.
Our experimental results on prominent benchmark datasets exhibit that the proposed method significantly surpasses state-of-the-art competitors.
- Score: 27.037140667247208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerous self-supervised learning paradigms, such as contrastive learning and
masked image modeling, have been proposed to acquire powerful and general
representations from unlabeled data. However, these models are commonly
pretrained within their specific framework alone, failing to consider the
complementary nature of visual representations. To tackle this issue, we
introduce Comprehensive Distillation with Multiple Self-supervised Teachers
(DMT) for pretrained model compression, which leverages the strengths of
multiple off-the-shelf self-supervised models. Our experimental results on
prominent benchmark datasets exhibit that the proposed method significantly
surpasses state-of-the-art competitors while retaining favorable efficiency
metrics. On classification tasks, our DMT framework utilizing three different
self-supervised ViT-Base teachers enhances the performance of both small/tiny
models and the base model itself. For dense tasks, DMT elevates the AP/mIoU of
standard SSL models on MS-COCO and ADE20K datasets by 4.0%.
Related papers
- PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Interactive DualChecker for Mitigating Hallucinations in Distilling Large Language Models [7.632217365130212]
Large Language Models (LLMs) have demonstrated exceptional capabilities across various machine learning (ML) tasks.
These models can produce hallucinations, particularly in domains with incomplete knowledge.
We introduce DualChecker, an innovative framework designed to mitigate hallucinations and improve the performance of both teacher and student models.
arXiv Detail & Related papers (2024-08-22T12:04:04Z) - Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition [58.41784639847413]
Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signals.
In this paper, a multi-teacher PKD (MT-PKDOT) method with self-distillation is introduced to align diverse teacher representations before distilling them to the student.
Results indicate that our proposed method can outperform SOTA PKD methods.
arXiv Detail & Related papers (2024-08-16T22:11:01Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Unlock the Power: Competitive Distillation for Multi-Modal Large
Language Models [17.25135606956287]
Competitive Multi-modal Distillation framework (CoMD) captures bidirectional feedback between teacher and student models.
Our experimental analysis of diverse datasets shows that our knowledge transfer method consistently improves the capabilities of the student model.
arXiv Detail & Related papers (2023-11-14T14:49:46Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - KDSM: An uplift modeling framework based on knowledge distillation and
sample matching [2.036924568983982]
Uplift modeling aims to estimate the treatment effect on individuals.
Tree-based methods are adept at fitting increment and generalization, while neural-network-based models excel at predicting absolute value and precision.
In this paper, we proposed an uplift modeling framework based on Knowledge Distillation and Sample Matching (KDSM)
arXiv Detail & Related papers (2023-03-06T09:15:28Z) - Self-Supervised Monocular Depth Estimation with Self-Reference
Distillation and Disparity Offset Refinement [15.012694052674899]
We propose two novel ideas to improve self-supervised monocular depth estimation.
We use a parameter-optimized model as the teacher updated as the training epochs to provide additional supervision.
We leverage the contextual consistency between high-scale and low-scale features to obtain multiscale disparity offsets.
arXiv Detail & Related papers (2023-02-20T06:28:52Z) - CTDS: Centralized Teacher with Decentralized Student for Multi-Agent
Reinforcement Learning [114.69155066932046]
This work proposes a novel.
Teacher with Decentralized Student (C TDS) framework, which consists of a teacher model and a student model.
Specifically, the teacher model allocates the team reward by learning individual Q-values conditioned on global observation.
The student model utilizes the partial observations to approximate the Q-values estimated by the teacher model.
arXiv Detail & Related papers (2022-03-16T06:03:14Z) - Multi-Task Self-Training for Learning General Representations [97.01728635294879]
Multi-task self-training (MuST) harnesses the knowledge in independent specialized teacher models to train a single general student model.
MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets.
arXiv Detail & Related papers (2021-08-25T17:20:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.