Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher
Framework
- URL: http://arxiv.org/abs/2110.01253v1
- Date: Mon, 4 Oct 2021 08:45:18 GMT
- Title: Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher
Framework
- Authors: Tengteng Huang, Yifan Sun, Xun Wang, Haotian Yao, Chi Zhang
- Abstract summary: "Spatial Ensemble" is a novel model smoothing mechanism in parallel with the Temporal Moving Average.
It stitches different fragments of historical student models into a unity, yielding the "Spatial Ensemble" effect.
Their integration, named Spatial-Temporal Smoothing, brings general (sometimes significant) improvement to the student-teacher learning framework.
- Score: 39.44523908176695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model smoothing is of central importance for obtaining a reliable teacher
model in the student-teacher framework, where the teacher generates surrogate
supervision signals to train the student. A popular model smoothing method is
the Temporal Moving Average (TMA), which continuously averages the teacher
parameters with the up-to-date student parameters. In this paper, we propose
"Spatial Ensemble", a novel model smoothing mechanism in parallel with TMA.
Spatial Ensemble randomly picks up a small fragment of the student model to
directly replace the corresponding fragment of the teacher model.
Consequentially, it stitches different fragments of historical student models
into a unity, yielding the "Spatial Ensemble" effect. Spatial Ensemble obtains
comparable student-teacher learning performance by itself and demonstrates
valuable complementarity with temporal moving average. Their integration, named
Spatial-Temporal Smoothing, brings general (sometimes significant) improvement
to the student-teacher learning framework on a variety of state-of-the-art
methods. For example, based on the self-supervised method BYOL, it yields +0.9%
top-1 accuracy improvement on ImageNet, while based on the semi-supervised
approach FixMatch, it increases the top-1 accuracy by around +6% on CIFAR-10
when only few training labels are available. Codes and models are available at:
https://github.com/tengteng95/Spatial_Ensemble.
Related papers
- PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - CLDA: Collaborative Learning for Enhanced Unsupervised Domain Adaptation [15.97351561456467]
Collaborative Learning is a method that updates the teacher's non-salient parameters using the student model and at the same time enhance the student's performance.
CLDA achieves an improvement of +0.7% mIoU for teacher and +1.4% mIoU for student compared to the baseline model in the GTA to Cityscapes.
arXiv Detail & Related papers (2024-09-04T13:35:15Z) - Fishers Harvest Parallel Unlearning in Inherited Model Networks [26.47424619448623]
This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance.
A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Acyclic Graph (DAG)
Our framework accelerates unlearning by 99% compared to alternative methods.
arXiv Detail & Related papers (2024-08-16T02:29:38Z) - Periodically Exchange Teacher-Student for Source-Free Object Detection [7.222926042027062]
Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data.
Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model.
We propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model.
arXiv Detail & Related papers (2023-11-23T11:30:54Z) - Switching Temporary Teachers for Semi-Supervised Semantic Segmentation [45.20519672287495]
The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's.
This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student.
arXiv Detail & Related papers (2023-10-28T08:49:16Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Exploring The Role of Mean Teachers in Self-supervised Masked
Auto-Encoders [64.03000385267339]
Masked image modeling (MIM) has become a popular strategy for self-supervised learning(SSL) of visual representations with Vision Transformers.
We present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE.
RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training.
arXiv Detail & Related papers (2022-10-05T08:08:55Z) - ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal
Self-Ensemble for Active Learning [7.94190631530826]
Active learning (AL) is becoming increasingly important to maximize the efficiency of the training process.
We present an AL algorithm, namely student-teacher consistency-based AL (ST-CoNAL)
Experiments conducted for image classification tasks on CIFAR-10, CIFAR-100, Caltech-256, and Tiny ImageNet datasets demonstrate that the proposed STCoNAL significantly better performance than the existing acquisition methods.
arXiv Detail & Related papers (2022-07-05T17:25:59Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.