Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher
Framework
- URL: http://arxiv.org/abs/2110.01253v1
- Date: Mon, 4 Oct 2021 08:45:18 GMT
- Title: Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher
Framework
- Authors: Tengteng Huang, Yifan Sun, Xun Wang, Haotian Yao, Chi Zhang
- Abstract summary: "Spatial Ensemble" is a novel model smoothing mechanism in parallel with the Temporal Moving Average.
It stitches different fragments of historical student models into a unity, yielding the "Spatial Ensemble" effect.
Their integration, named Spatial-Temporal Smoothing, brings general (sometimes significant) improvement to the student-teacher learning framework.
- Score: 39.44523908176695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model smoothing is of central importance for obtaining a reliable teacher
model in the student-teacher framework, where the teacher generates surrogate
supervision signals to train the student. A popular model smoothing method is
the Temporal Moving Average (TMA), which continuously averages the teacher
parameters with the up-to-date student parameters. In this paper, we propose
"Spatial Ensemble", a novel model smoothing mechanism in parallel with TMA.
Spatial Ensemble randomly picks up a small fragment of the student model to
directly replace the corresponding fragment of the teacher model.
Consequentially, it stitches different fragments of historical student models
into a unity, yielding the "Spatial Ensemble" effect. Spatial Ensemble obtains
comparable student-teacher learning performance by itself and demonstrates
valuable complementarity with temporal moving average. Their integration, named
Spatial-Temporal Smoothing, brings general (sometimes significant) improvement
to the student-teacher learning framework on a variety of state-of-the-art
methods. For example, based on the self-supervised method BYOL, it yields +0.9%
top-1 accuracy improvement on ImageNet, while based on the semi-supervised
approach FixMatch, it increases the top-1 accuracy by around +6% on CIFAR-10
when only few training labels are available. Codes and models are available at:
https://github.com/tengteng95/Spatial_Ensemble.
Related papers
- Parameter-Selective Continual Test-Time Adaptation [3.480626767752489]
Continual Test-Time Adaptation (CTTA) aims to adapt a pretrained model to ever-changing environments during the test time under continuous domain shifts.
PSMT method is capable of effectively updating the critical parameters within the MT network under domain shifts.
arXiv Detail & Related papers (2024-07-02T13:18:15Z) - Periodically Exchange Teacher-Student for Source-Free Object Detection [7.222926042027062]
Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data.
Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model.
We propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model.
arXiv Detail & Related papers (2023-11-23T11:30:54Z) - Switching Temporary Teachers for Semi-Supervised Semantic Segmentation [45.20519672287495]
The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's.
This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student.
arXiv Detail & Related papers (2023-10-28T08:49:16Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Exploring The Role of Mean Teachers in Self-supervised Masked
Auto-Encoders [64.03000385267339]
Masked image modeling (MIM) has become a popular strategy for self-supervised learning(SSL) of visual representations with Vision Transformers.
We present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE.
RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training.
arXiv Detail & Related papers (2022-10-05T08:08:55Z) - ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal
Self-Ensemble for Active Learning [7.94190631530826]
Active learning (AL) is becoming increasingly important to maximize the efficiency of the training process.
We present an AL algorithm, namely student-teacher consistency-based AL (ST-CoNAL)
Experiments conducted for image classification tasks on CIFAR-10, CIFAR-100, Caltech-256, and Tiny ImageNet datasets demonstrate that the proposed STCoNAL significantly better performance than the existing acquisition methods.
arXiv Detail & Related papers (2022-07-05T17:25:59Z) - Bag of Instances Aggregation Boosts Self-supervised Learning [122.61914701794296]
We propose a simple but effective distillation strategy for unsupervised learning.
Our method, termed as BINGO, targets at transferring the relationship learned by the teacher to the student.
BINGO achieves new state-of-the-art performance on small scale models.
arXiv Detail & Related papers (2021-07-04T17:33:59Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.