Semi-DETR: Semi-Supervised Object Detection with Detection Transformers
- URL: http://arxiv.org/abs/2307.08095v1
- Date: Sun, 16 Jul 2023 16:32:14 GMT
- Title: Semi-DETR: Semi-Supervised Object Detection with Detection Transformers
- Authors: Jiacheng Zhang, Xiangru Lin, Wei Zhang, Kuo Wang, Xiao Tan, Junyu Han,
Errui Ding, Jingdong Wang, Guanbin Li
- Abstract summary: We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
- Score: 105.45018934087076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyze the DETR-based framework on semi-supervised object detection
(SSOD) and observe that (1) the one-to-one assignment strategy generates
incorrect matching when the pseudo ground-truth bounding box is inaccurate,
leading to training inefficiency; (2) DETR-based detectors lack deterministic
correspondence between the input query and its prediction output, which hinders
the applicability of the consistency-based regularization widely used in
current SSOD methods. We present Semi-DETR, the first transformer-based
end-to-end semi-supervised object detector, to tackle these problems.
Specifically, we propose a Stage-wise Hybrid Matching strategy that combines
the one-to-many assignment and one-to-one assignment strategies to improve the
training efficiency of the first stage and thus provide high-quality pseudo
labels for the training of the second stage. Besides, we introduce a Crossview
Query Consistency method to learn the semantic feature invariance of object
queries from different views while avoiding the need to find deterministic
query correspondence. Furthermore, we propose a Cost-based Pseudo Label Mining
module to dynamically mine more pseudo boxes based on the matching cost of
pseudo ground truth bounding boxes for consistency training. Extensive
experiments on all SSOD settings of both COCO and Pascal VOC benchmark datasets
show that our Semi-DETR method outperforms all state-of-the-art methods by
clear margins. The PaddlePaddle version code1 is at
https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/semi_det/semi_detr.
Related papers
- Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection [12.417754433715903]
We introduce Sparse Semi-DETR, a novel transformer-based, end-to-end semi-supervised object detection solution.
Sparse Semi-DETR incorporates a Query Refinement Module to enhance the quality of object queries, significantly improving detection capabilities for small and partially obscured objects.
On the MS-COCO and Pascal VOC object detection benchmarks, Sparse Semi-DETR achieves a significant improvement over current state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T10:22:23Z) - End-to-End Lane detection with One-to-Several Transformer [6.79236957488334]
O2SFormer converges 12.5x faster than DETR for the ResNet18 backbone.
O2SFormer with ResNet50 backbone achieves 77.83% F1 score on CULane dataset, outperforming existing Transformer-based and CNN-based detectors.
arXiv Detail & Related papers (2023-05-01T06:07:11Z) - Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection [98.66771688028426]
We propose a Ambiguity-Resistant Semi-supervised Learning (ARSL) for one-stage detectors.
Joint-Confidence Estimation (JCE) is proposed to quantifies the classification and localization quality of pseudo labels.
ARSL effectively mitigates the ambiguities and achieves state-of-the-art SSOD performance on MS COCO and PASCAL VOC.
arXiv Detail & Related papers (2023-03-27T07:46:58Z) - W2N:Switching From Weak Supervision to Noisy Supervision for Object
Detection [64.10643170523414]
We propose a novel WSOD framework with a new paradigm that switches from weak supervision to noisy supervision (W2N)
In the localization adaptation module, we propose a regularization loss to reduce the proportion of discriminative parts in original pseudo ground-truths.
Our W2N outperforms all existing pure WSOD methods and transfer learning methods.
arXiv Detail & Related papers (2022-07-25T12:13:48Z) - SIOD: Single Instance Annotated Per Category Per Image for Object
Detection [67.64774488115299]
We propose the Single Instance annotated Object Detection (SIOD), requiring only one instance annotation for each existing category in an image.
Degraded from inter-task (WSOD) or inter-image (SSOD) discrepancies to the intra-image discrepancy, SIOD provides more reliable and rich prior knowledge for mining the rest of unlabeled instances.
Under the SIOD setting, we propose a simple yet effective framework, termed Dual-Mining (DMiner), which consists of a Similarity-based Pseudo Label Generating module (SPLG) and a Pixel-level Group Contrastive Learning module (PGCL)
arXiv Detail & Related papers (2022-03-29T08:49:51Z) - Mitigating the Mutual Error Amplification for Semi-Supervised Object
Detection [92.52505195585925]
We propose a Cross Teaching (CT) method, aiming to mitigate the mutual error amplification by introducing a rectification mechanism of pseudo labels.
In contrast to existing mutual teaching methods that directly treat predictions from other detectors as pseudo labels, we propose the Label Rectification Module (LRM)
arXiv Detail & Related papers (2022-01-26T03:34:57Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - RelationTrack: Relation-aware Multiple Object Tracking with Decoupled
Representation [3.356734463419838]
Existing online multiple object tracking (MOT) algorithms often consist of two subtasks, detection and re-identification (ReID)
In order to enhance the inference speed and reduce the complexity, current methods commonly integrate these double subtasks into a unified framework.
We devise a module named Global Context Disentangling (GCD) that decouples the learned representation into detection-specific and ReID-specific embeddings.
To resolve this restriction, we develop a module, referred to as Guided Transformer (GTE), by combining the powerful reasoning ability of Transformer encoder and deformable attention.
arXiv Detail & Related papers (2021-05-10T13:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.