Two Independent Teachers are Better Role Model
- URL: http://arxiv.org/abs/2306.05745v2
- Date: Thu, 21 Dec 2023 21:28:52 GMT
- Title: Two Independent Teachers are Better Role Model
- Authors: Afifa Khaled, Ahmed A. Mubarak, Kun He
- Abstract summary: We propose a new deep learning model called 3D-DenseUNet.
It works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss.
We also propose a new method called Two Independent Teachers, that summarizes the model weights instead of label predictions.
- Score: 7.001845833295753
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent deep learning models have attracted substantial attention in infant
brain analysis. These models have performed state-of-the-art performance, such
as semi-supervised techniques (e.g., Temporal Ensembling, mean teacher).
However, these models depend on an encoder-decoder structure with stacked local
operators to gather long-range information, and the local operators limit the
efficiency and effectiveness. Besides, the $MRI$ data contain different tissue
properties ($TPs$) such as $T1$ and $T2$. One major limitation of these models
is that they use both data as inputs to the segment process, i.e., the models
are trained on the dataset once, and it requires much computational and memory
requirements during inference. In this work, we address the above limitations
by designing a new deep-learning model, called 3D-DenseUNet, which works as
adaptable global aggregation blocks in down-sampling to solve the issue of
spatial information loss. The self-attention module connects the down-sampling
blocks to up-sampling blocks, and integrates the feature maps in three
dimensions of spatial and channel, effectively improving the representation
potential and discriminating ability of the model. Additionally, we propose a
new method called Two Independent Teachers ($2IT$), that summarizes the model
weights instead of label predictions. Each teacher model is trained on
different types of brain data, $T1$ and $T2$, respectively. Then, a fuse model
is added to improve test accuracy and enable training with fewer parameters and
labels compared to the Temporal Ensembling method without modifying the network
architecture. Empirical results demonstrate the effectiveness of the proposed
method. The code is available at
https://github.com/AfifaKhaled/Two-Independent-Teachers-are-Better-Role-Model.
Related papers
- Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient [9.519619751861333]
We propose a state space model (SSM) based world model based on Mamba.
It achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies.
This model is accessible and can be trained on an off-the-shelf laptop.
arXiv Detail & Related papers (2024-10-11T15:10:40Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - UnLearning from Experience to Avoid Spurious Correlations [3.283369870504872]
We propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE)
Our method is based on using two classification models trained in parallel: student and teacher models.
We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.
arXiv Detail & Related papers (2024-09-04T15:06:44Z) - Reusing Convolutional Neural Network Models through Modularization and
Composition [22.823870645316397]
We propose two modularization approaches named CNNSplitter and GradSplitter.
CNNSplitter decomposes a trained convolutional neural network (CNN) model into $N$ small reusable modules.
The resulting modules can be reused to patch existing CNN models or build new CNN models through composition.
arXiv Detail & Related papers (2023-11-08T03:18:49Z) - Lightweight Self-Knowledge Distillation with Multi-source Information
Fusion [3.107478665474057]
Knowledge Distillation (KD) is a powerful technique for transferring knowledge between neural network models.
We propose a lightweight SKD framework that utilizes multi-source information to construct a more informative teacher.
We validate the performance of the proposed DRG, DSR, and their combination through comprehensive experiments on various datasets and models.
arXiv Detail & Related papers (2023-05-16T05:46:31Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Not All Models Are Equal: Predicting Model Transferability in a
Self-challenging Fisher Space [51.62131362670815]
This paper addresses the problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks.
It proposes a new transferability metric called textbfSelf-challenging textbfFisher textbfDiscriminant textbfAnalysis (textbfSFDA)
arXiv Detail & Related papers (2022-07-07T01:33:25Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Contrastive Neighborhood Alignment [81.65103777329874]
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features.
The target model aims to mimic the local structure of the source representation space using a contrastive loss.
CNA is illustrated in three scenarios: manifold learning, where the model maintains the local topology of the original data in a dimension-reduced space; model distillation, where a small student model is trained to mimic a larger teacher; and legacy model update, where an older model is replaced by a more powerful one.
arXiv Detail & Related papers (2022-01-06T04:58:31Z) - Complementary Ensemble Learning [1.90365714903665]
We derive a technique to improve performance of state-of-the-art deep learning models.
Specifically, we train auxiliary models which are able to complement state-of-the-art model uncertainty.
arXiv Detail & Related papers (2021-11-09T03:23:05Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.