Related papers: Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework

Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework

URL: http://arxiv.org/abs/2310.15646v5
Date: Thu, 18 Jan 2024 07:44:08 GMT
Title: Mean Teacher DETR with Masked Feature Alignment: A Robust Domain Adaptive Detection Transformer Framework
Authors: Weixi Weng, Chun Yuan
Abstract summary: Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage. In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation. In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher.
Score: 41.998727427261734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unsupervised domain adaptation object detection (UDAOD) research on Detection Transformer(DETR) mainly focuses on feature alignment and existing methods can be divided into two kinds, each of which has its unresolved issues. One-stage feature alignment methods can easily lead to performance fluctuation and training stagnation. Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage, each facing problems in obtaining reliable pretrained model and achieving consistent performance gains. Methods mentioned above have not yet explore how to utilize the third related domain such as target-like domain to assist adaptation. To address these issues, we propose a two-stage framework named MTM, i.e. Mean Teacher-DETR with Masked Feature Alignment. In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation. In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher and propose a module called Object Queries Knowledge Transfer (OQKT) to ensure consistent performance gains of the student model. Most importantly, we propose masked feature alignment methods including Masked Domain Query-based Feature Alignment (MDQFA) and Masked Token-wise Feature Alignment (MTWFA) to alleviate domain shift in a more robust way, which not only prevent training stagnation and lead to a robust pretrained model in the pretraining stage, but also enhance the model's target performance in the self-training stage. Experiments on three challenging scenarios and a theoretical analysis verify the effectiveness of MTM.

Related papers

Teaching in adverse scenes: a statistically feedback-driven threshold and mask adjustment teacher-student framework for object detection in UAV images under adverse scenes [22.941247028439477]
Unsupervised Domain Adaptation (UDA) has shown promise in effectively alleviating the performance degradation caused by domain gaps between source and target domains.<n>However, existing UDA studies are based on natural images or clear UAV imagery.<n>We propose the first benchmark for UAV object detection in adverse scenes, the Statistical Feedback-Driven Threshold and Mask Adjustment Teacher-Student Framework.
arXiv Detail & Related papers (2025-06-12T09:21:58Z)
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions. Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z)
DaMSTF: Domain Adversarial Learning Enhanced Meta Self-Training for Domain Adaptation [20.697905456202754]
We propose a new self-training framework for domain adaptation, namely Domain adversarial learning enhanced Self-Training Framework (DaMSTF) DaMSTF involves meta-learning to estimate the importance of each pseudo instance, so as to simultaneously reduce the label noise and preserve hard examples. DaMSTF improves the performance of BERT with an average of nearly 4%.
arXiv Detail & Related papers (2023-08-05T00:14:49Z)
Contrastive Mean Teacher for Domain Adaptive Object Detectors [20.06919799819326]
Mean-teacher self-training is a powerful paradigm in unsupervised domain adaptation for object detection, but it struggles with low-quality pseudo-labels. We propose Contrastive Mean Teacher (CMT) -- a unified, general-purpose framework with the two paradigms naturally integrated to maximize beneficial learning signals. CMT leads to new state-of-the-art target-domain performance: 51.9% mAP on Foggy Cityscapes, outperforming the previously best by 2.1% mAP.
arXiv Detail & Related papers (2023-05-04T17:55:17Z)
Boosting Cross-Domain Speech Recognition with Self-Supervision [35.01508881708751]
Cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to mismatch between training and testing distributions. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data. This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm.
arXiv Detail & Related papers (2022-06-20T14:02:53Z)
Unsupervised Domain Adaptation for Monocular 3D Object Detection via Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D. We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain. STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z)
UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose Estimation [84.16372642822495]
We propose an unsupervised domain adaptation (UDA) for category-level object pose estimation, called textbfUDA-COPE. Inspired by the recent multi-modal UDA techniques, the proposed method exploits a teacher-student self-supervised learning scheme to train a pose estimation network without using target domain labels.
arXiv Detail & Related papers (2021-11-24T16:00:48Z)
Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z)
Semi-Supervised Domain Adaptation with Prototypical Alignment and Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled. To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks. Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.