DETR with Additional Global Aggregation for Cross-domain Weakly
Supervised Object Detection
- URL: http://arxiv.org/abs/2304.07082v1
- Date: Fri, 14 Apr 2023 12:16:42 GMT
- Title: DETR with Additional Global Aggregation for Cross-domain Weakly
Supervised Object Detection
- Authors: Zongheng Tang, Yifan Sun, Si Liu, Yi Yang
- Abstract summary: This paper presents a DETR-based method for cross-domain weakly supervised object detection (CDWSOD)
We think DETR has strong potential for CDWSOD due to an insight: the encoder and the decoder in DETR are both based on the attention mechanism.
The aggregation results, i.e., image-level predictions, can naturally exploit the weak supervision for domain alignment.
- Score: 34.14603473160207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a DETR-based method for cross-domain weakly supervised
object detection (CDWSOD), aiming at adapting the detector from source to
target domain through weak supervision. We think DETR has strong potential for
CDWSOD due to an insight: the encoder and the decoder in DETR are both based on
the attention mechanism and are thus capable of aggregating semantics across
the entire image. The aggregation results, i.e., image-level predictions, can
naturally exploit the weak supervision for domain alignment. Such motivated, we
propose DETR with additional Global Aggregation (DETR-GA), a CDWSOD detector
that simultaneously makes "instance-level + image-level" predictions and
utilizes "strong + weak" supervisions. The key point of DETR-GA is very simple:
for the encoder / decoder, we respectively add multiple class queries / a
foreground query to aggregate the semantics into image-level predictions. Our
query-based aggregation has two advantages. First, in the encoder, the
weakly-supervised class queries are capable of roughly locating the
corresponding positions and excluding the distraction from non-relevant
regions. Second, through our design, the object queries and the foreground
query in the decoder share consensus on the class semantics, therefore making
the strong and weak supervision mutually benefit each other for domain
alignment. Extensive experiments on four popular cross-domain benchmarks show
that DETR-GA significantly improves CSWSOD and advances the states of the art
(e.g., 29.0% --> 79.4% mAP on PASCAL VOC --> Clipart_all dataset).
Related papers
- DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment [7.768332621617199]
We introduce a strong DETR-based detector named Domain Adaptive detection TRansformer ( DATR) for unsupervised domain adaptation of object detection.
Our proposed DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias.
Experiments demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios.
arXiv Detail & Related papers (2024-05-20T03:48:45Z) - MS-DETR: Efficient DETR Training with Mixed Supervision [74.93329653526952]
MS-DETR places one-to-many supervision to the object queries of the primary decoder that is used for inference.
Our approach does not need additional decoder branches or object queries.
Experimental results show that our approach outperforms related DETR variants.
arXiv Detail & Related papers (2024-01-08T16:08:53Z) - Rank-DETR for High Quality Object Detection [52.82810762221516]
A highly performant object detector requires accurate ranking for the bounding box predictions.
In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs.
arXiv Detail & Related papers (2023-10-13T04:48:32Z) - Towards Hard-Positive Query Mining for DETR-based Human-Object
Interaction Detection [20.809479387186506]
Human-Object Interaction (HOI) detection is a core task for high-level image understanding.
In this paper, we propose to enhance Detection Transformer (DETR)-based HOI detectors by mining hard-positive queries.
Experimental results show that our proposed approach can be widely applied to existing DETR-based HOI detectors.
arXiv Detail & Related papers (2022-07-12T04:03:12Z) - Cross Domain Object Detection by Target-Perceived Dual Branch
Distillation [49.68119030818388]
Cross domain object detection is a realistic and challenging task in the wild.
We propose a novel Target-perceived Dual-branch Distillation (TDD) framework.
Our TDD significantly outperforms the state-of-the-art methods on all the benchmarks.
arXiv Detail & Related papers (2022-05-03T03:51:32Z) - Domain Generalisation for Object Detection under Covariate and Concept Shift [10.32461766065764]
Domain generalisation aims to promote the learning of domain-invariant features while suppressing domain-specific features.
An approach to domain generalisation for object detection is proposed, the first such approach applicable to any object detection architecture.
arXiv Detail & Related papers (2022-03-10T11:14:18Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - Cross-domain Detection via Graph-induced Prototype Alignment [114.8952035552862]
We propose a Graph-induced Prototype Alignment (GPA) framework to seek for category-level domain alignment.
In addition, in order to alleviate the negative effect of class-imbalance on domain adaptation, we design a Class-reweighted Contrastive Loss.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-03-28T17:46:55Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.