Mean Teacher DETR with Masked Feature Alignment: A Robust Domain
Adaptive Detection Transformer Framework
- URL: http://arxiv.org/abs/2310.15646v5
- Date: Thu, 18 Jan 2024 07:44:08 GMT
- Title: Mean Teacher DETR with Masked Feature Alignment: A Robust Domain
Adaptive Detection Transformer Framework
- Authors: Weixi Weng, Chun Yuan
- Abstract summary: Two-stage feature alignment method based on mean teacher comprises a pretraining stage followed by a self-training stage.
In the pretraining stage, we utilize labeled target-like images produced by image style transfer to avoid performance fluctuation.
In the self-training stage, we leverage unlabeled target images by pseudo labels based on mean teacher.
- Score: 41.998727427261734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised domain adaptation object detection (UDAOD) research on Detection
Transformer(DETR) mainly focuses on feature alignment and existing methods can
be divided into two kinds, each of which has its unresolved issues. One-stage
feature alignment methods can easily lead to performance fluctuation and
training stagnation. Two-stage feature alignment method based on mean teacher
comprises a pretraining stage followed by a self-training stage, each facing
problems in obtaining reliable pretrained model and achieving consistent
performance gains. Methods mentioned above have not yet explore how to utilize
the third related domain such as target-like domain to assist adaptation. To
address these issues, we propose a two-stage framework named MTM, i.e. Mean
Teacher-DETR with Masked Feature Alignment. In the pretraining stage, we
utilize labeled target-like images produced by image style transfer to avoid
performance fluctuation. In the self-training stage, we leverage unlabeled
target images by pseudo labels based on mean teacher and propose a module
called Object Queries Knowledge Transfer (OQKT) to ensure consistent
performance gains of the student model. Most importantly, we propose masked
feature alignment methods including Masked Domain Query-based Feature Alignment
(MDQFA) and Masked Token-wise Feature Alignment (MTWFA) to alleviate domain
shift in a more robust way, which not only prevent training stagnation and lead
to a robust pretrained model in the pretraining stage, but also enhance the
model's target performance in the self-training stage. Experiments on three
challenging scenarios and a theoretical analysis verify the effectiveness of
MTM.
Related papers
- Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation [49.827306773992376]
Continual Test-Time Adaptation (CTTA) is proposed to migrate a source pre-trained model to continually changing target distributions.
Our proposed method attains state-of-the-art performance in both classification and segmentation CTTA tasks.
arXiv Detail & Related papers (2023-12-19T15:34:52Z) - DaMSTF: Domain Adversarial Learning Enhanced Meta Self-Training for
Domain Adaptation [20.697905456202754]
We propose a new self-training framework for domain adaptation, namely Domain adversarial learning enhanced Self-Training Framework (DaMSTF)
DaMSTF involves meta-learning to estimate the importance of each pseudo instance, so as to simultaneously reduce the label noise and preserve hard examples.
DaMSTF improves the performance of BERT with an average of nearly 4%.
arXiv Detail & Related papers (2023-08-05T00:14:49Z) - Contrastive Mean Teacher for Domain Adaptive Object Detectors [20.06919799819326]
Mean-teacher self-training is a powerful paradigm in unsupervised domain adaptation for object detection, but it struggles with low-quality pseudo-labels.
We propose Contrastive Mean Teacher (CMT) -- a unified, general-purpose framework with the two paradigms naturally integrated to maximize beneficial learning signals.
CMT leads to new state-of-the-art target-domain performance: 51.9% mAP on Foggy Cityscapes, outperforming the previously best by 2.1% mAP.
arXiv Detail & Related papers (2023-05-04T17:55:17Z) - Boosting Cross-Domain Speech Recognition with Self-Supervision [35.01508881708751]
Cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to mismatch between training and testing distributions.
Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data.
This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm.
arXiv Detail & Related papers (2022-06-20T14:02:53Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose
Estimation [84.16372642822495]
We propose an unsupervised domain adaptation (UDA) for category-level object pose estimation, called textbfUDA-COPE.
Inspired by the recent multi-modal UDA techniques, the proposed method exploits a teacher-student self-supervised learning scheme to train a pose estimation network without using target domain labels.
arXiv Detail & Related papers (2021-11-24T16:00:48Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z) - Semi-Supervised Domain Adaptation with Prototypical Alignment and
Consistency Learning [86.6929930921905]
This paper studies how much it can help address domain shifts if we further have a few target samples labeled.
To explore the full potential of landmarks, we incorporate a prototypical alignment (PA) module which calculates a target prototype for each class from the landmarks.
Specifically, we severely perturb the labeled images, making PA non-trivial to achieve and thus promoting model generalizability.
arXiv Detail & Related papers (2021-04-19T08:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.