Cross-domain Detection Transformer based on Spatial-aware and
Semantic-aware Token Alignment
- URL: http://arxiv.org/abs/2206.00222v1
- Date: Wed, 1 Jun 2022 04:13:22 GMT
- Title: Cross-domain Detection Transformer based on Spatial-aware and
Semantic-aware Token Alignment
- Authors: Jinhong Deng, Xiaoyue Zhang, Wen Li, Lixin Duan
- Abstract summary: We propose a new method called Spatial-aware and Semantic-aware Token Alignment (SSTA) for cross-domain detection transformers.
For spatial-aware token alignment, we can extract the information from the cross-attention map (CAM) to align the distribution of tokens according to their attention to object queries.
For semantic-aware token alignment, we inject the category information into the cross-attention map and construct domain embedding to guide the learning of a multi-class discriminator.
- Score: 31.759205815348658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detection transformers like DETR have recently shown promising performance on
many object detection tasks, but the generalization ability of those methods is
still quite challenging for cross-domain adaptation scenarios. To address the
cross-domain issue, a straightforward way is to perform token alignment with
adversarial training in transformers. However, its performance is often
unsatisfactory as the tokens in detection transformers are quite diverse and
represent different spatial and semantic information. In this paper, we propose
a new method called Spatial-aware and Semantic-aware Token Alignment (SSTA) for
cross-domain detection transformers. In particular, we take advantage of the
characteristics of cross-attention as used in detection transformer and propose
the spatial-aware token alignment (SpaTA) and the semantic-aware token
alignment (SemTA) strategies to guide the token alignment across domains. For
spatial-aware token alignment, we can extract the information from the
cross-attention map (CAM) to align the distribution of tokens according to
their attention to object queries. For semantic-aware token alignment, we
inject the category information into the cross-attention map and construct
domain embedding to guide the learning of a multi-class discriminator so as to
model the category relationship and achieve category-level token alignment
during the entire adaptation process. We conduct extensive experiments on
several widely-used benchmarks, and the results clearly show the effectiveness
of our proposed method over existing state-of-the-art baselines.
Related papers
- DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment [7.768332621617199]
We introduce a strong DETR-based detector named Domain Adaptive detection TRansformer ( DATR) for unsupervised domain adaptation of object detection.
Our proposed DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias.
Experiments demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios.
arXiv Detail & Related papers (2024-05-20T03:48:45Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Multi-class Token Transformer for Weakly Supervised Semantic
Segmentation [94.78965643354285]
We propose a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS)
Inspired by the fact that the attended regions of the one-class token in the standard vision transformer can be leveraged to form a class-agnostic localization map, we investigate if the transformer model can also effectively capture class-specific attention for more discriminative object localization.
The proposed framework is shown to fully complement the Class Activation Mapping (CAM) method, leading to remarkably superior WSSS results on the PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2022-03-06T07:18:23Z) - CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation [44.06904757181245]
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to a different unlabeled target domain.
One fundamental problem for the category level based UDA is the production of pseudo labels for samples in target domain.
We design a two-way center-aware labeling algorithm to produce pseudo labels for target samples.
Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment.
arXiv Detail & Related papers (2021-09-13T17:59:07Z) - Exploring Sequence Feature Alignment for Domain Adaptive Detection
Transformers [141.70707071815653]
We propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers.
SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module.
Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods.
arXiv Detail & Related papers (2021-07-27T07:17:12Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - Your Classifier can Secretly Suffice Multi-Source Domain Adaptation [72.47706604261992]
Multi-Source Domain Adaptation (MSDA) deals with the transfer of task knowledge from multiple labeled source domains to an unlabeled target domain.
We present a different perspective to MSDA wherein deep models are observed to implicitly align the domains under label supervision.
arXiv Detail & Related papers (2021-03-20T12:44:13Z) - Cross-domain Detection via Graph-induced Prototype Alignment [114.8952035552862]
We propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment.
In addition, in order to alleviate the negative effect of class-imbalance on domain adaptation, we design a Class-reweighted Contrastive Loss.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-03-28T17:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.