Periodically Exchange Teacher-Student for Source-Free Object Detection
- URL: http://arxiv.org/abs/2311.13930v1
- Date: Thu, 23 Nov 2023 11:30:54 GMT
- Title: Periodically Exchange Teacher-Student for Source-Free Object Detection
- Authors: Qipeng Liu, Luojun Lin, Zhifeng Shen, Zhifeng Yang
- Abstract summary: Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data.
Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model.
We propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model.
- Score: 7.222926042027062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Source-free object detection (SFOD) aims to adapt the source detector to
unlabeled target domain data in the absence of source domain data. Most SFOD
methods follow the same self-training paradigm using mean-teacher (MT)
framework where the student model is guided by only one single teacher model.
However, such paradigm can easily fall into a training instability problem that
when the teacher model collapses uncontrollably due to the domain shift, the
student model also suffers drastic performance degradation. To address this
issue, we propose the Periodically Exchange Teacher-Student (PETS) method, a
simple yet novel approach that introduces a multiple-teacher framework
consisting of a static teacher, a dynamic teacher, and a student model. During
the training phase, we periodically exchange the weights between the static
teacher and the student model. Then, we update the dynamic teacher using the
moving average of the student model that has already been exchanged by the
static teacher. In this way, the dynamic teacher can integrate knowledge from
past periods, effectively reducing error accumulation and enabling a more
stable training process within the MT-based framework. Further, we develop a
consensus mechanism to merge the predictions of two teacher models to provide
higher-quality pseudo labels for student model. Extensive experiments on
multiple SFOD benchmarks show that the proposed method achieves
state-of-the-art performance compared with other related methods, demonstrating
the effectiveness and superiority of our method on SFOD task.
Related papers
- Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Parameter-Selective Continual Test-Time Adaptation [3.480626767752489]
Continual Test-Time Adaptation (CTTA) aims to adapt a pretrained model to ever-changing environments during the test time under continuous domain shifts.
PSMT method is capable of effectively updating the critical parameters within the MT network under domain shifts.
arXiv Detail & Related papers (2024-07-02T13:18:15Z) - Variational Continual Test-Time Adaptation [25.262385466354253]
The prior drift is crucial in Continual Test-Time Adaptation (CTTA) methods that only use unlabeled test data.
We introduce VCoTTA, a variational Bayesian approach to measure uncertainties in CTTA.
Experimental results on three datasets demonstrate the method's effectiveness in mitigating prior drift.
arXiv Detail & Related papers (2024-02-13T02:41:56Z) - Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation [62.021828104757745]
We propose AD-MT, an alternate diverse teaching approach in a teacher-student framework.
It involves a single student model and two non-trainable teacher models that are momentum-updated periodically and randomly in an alternate fashion.
arXiv Detail & Related papers (2023-11-29T02:44:54Z) - Switching Temporary Teachers for Semi-Supervised Semantic Segmentation [45.20519672287495]
The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's.
This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student.
arXiv Detail & Related papers (2023-10-28T08:49:16Z) - Distantly-Supervised Named Entity Recognition with Adaptive Teacher
Learning and Fine-grained Student Ensemble [56.705249154629264]
Self-training teacher-student frameworks are proposed to improve the robustness of NER models.
In this paper, we propose an adaptive teacher learning comprised of two teacher-student networks.
Fine-grained student ensemble updates each fragment of the teacher model with a temporal moving average of the corresponding fragment of the student, which enhances consistent predictions on each model fragment against noise.
arXiv Detail & Related papers (2022-12-13T12:14:09Z) - Reinforced Multi-Teacher Selection for Knowledge Distillation [54.72886763796232]
knowledge distillation is a popular method for model compression.
Current methods assign a fixed weight to a teacher model in the whole distillation.
Most of the existing methods allocate an equal weight to every teacher model.
In this paper, we observe that, due to the complexity of training examples and the differences in student model capability, learning differentially from teacher models can lead to better performance of student models distilled.
arXiv Detail & Related papers (2020-12-11T08:56:39Z) - Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection [9.64328205496046]
This paper focuses on Semi-supervisedd Object Detection (SSOD)
The teacher model serves a dual role as a teacher and a student.
The class imbalance issue in SSOD hinders an efficient knowledge transfer from teacher to student.
arXiv Detail & Related papers (2020-07-13T01:17:25Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.