Switching Temporary Teachers for Semi-Supervised Semantic Segmentation
- URL: http://arxiv.org/abs/2310.18640v1
- Date: Sat, 28 Oct 2023 08:49:16 GMT
- Title: Switching Temporary Teachers for Semi-Supervised Semantic Segmentation
- Authors: Jaemin Na, Jung-Woo Ha, Hyung Jin Chang, Dongyoon Han, Wonjun Hwang
- Abstract summary: The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's.
This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student.
- Score: 45.20519672287495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The teacher-student framework, prevalent in semi-supervised semantic
segmentation, mainly employs the exponential moving average (EMA) to update a
single teacher's weights based on the student's. However, EMA updates raise a
problem in that the weights of the teacher and student are getting coupled,
causing a potential performance bottleneck. Furthermore, this problem may
become more severe when training with more complicated labels such as
segmentation masks but with few annotated data. This paper introduces Dual
Teacher, a simple yet effective approach that employs dual temporary teachers
aiming to alleviate the coupling problem for the student. The temporary
teachers work in shifts and are progressively improved, so consistently prevent
the teacher and student from becoming excessively close. Specifically, the
temporary teachers periodically take turns generating pseudo-labels to train a
student model and maintain the distinct characteristics of the student model
for each epoch. Consequently, Dual Teacher achieves competitive performance on
the PASCAL VOC, Cityscapes, and ADE20K benchmarks with remarkably shorter
training times than state-of-the-art methods. Moreover, we demonstrate that our
approach is model-agnostic and compatible with both CNN- and Transformer-based
models. Code is available at \url{https://github.com/naver-ai/dual-teacher}.
Related papers
- Dual-Teacher Ensemble Models with Double-Copy-Paste for 3D Semi-Supervised Medical Image Segmentation [31.460549289419923]
Semi-supervised learning (SSL) techniques address the high labeling costs in 3D medical image segmentation.
We introduce the Staged Selective Ensemble (SSE) module, which selects different ensemble methods based on the characteristics of the samples.
Experimental results demonstrate the effectiveness of our proposed method in 3D medical image segmentation tasks.
arXiv Detail & Related papers (2024-10-15T11:23:15Z) - Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Periodically Exchange Teacher-Student for Source-Free Object Detection [7.222926042027062]
Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data.
Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model.
We propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model.
arXiv Detail & Related papers (2023-11-23T11:30:54Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher
Framework [39.44523908176695]
"Spatial Ensemble" is a novel model smoothing mechanism in parallel with the Temporal Moving Average.
It stitches different fragments of historical student models into a unity, yielding the "Spatial Ensemble" effect.
Their integration, named Spatial-Temporal Smoothing, brings general (sometimes significant) improvement to the student-teacher learning framework.
arXiv Detail & Related papers (2021-10-04T08:45:18Z) - Self-Training with Differentiable Teacher [80.62757989797095]
Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks.
The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions.
We propose ours, short for differentiable self-training, that treats teacher-student as a Stackelberg game.
arXiv Detail & Related papers (2021-09-15T02:06:13Z) - Representation Consolidation for Training Expert Students [54.90754502493968]
We show that a multi-head, multi-task distillation method is sufficient to consolidate representations from task-specific teacher(s) and improve downstream performance.
Our method can also combine the representational knowledge of multiple teachers trained on one or multiple domains into a single model.
arXiv Detail & Related papers (2021-07-16T17:58:18Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z) - Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection [9.64328205496046]
This paper focuses on Semi-supervisedd Object Detection (SSOD)
The teacher model serves a dual role as a teacher and a student.
The class imbalance issue in SSOD hinders an efficient knowledge transfer from teacher to student.
arXiv Detail & Related papers (2020-07-13T01:17:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.