StageInteractor: Query-based Object Detector with Cross-stage
Interaction
- URL: http://arxiv.org/abs/2304.04978v2
- Date: Mon, 15 Jan 2024 13:03:31 GMT
- Title: StageInteractor: Query-based Object Detector with Cross-stage
Interaction
- Authors: Yao Teng, Haisong Liu, Sheng Guo, Limin Wang
- Abstract summary: We propose a new query-based object detector with cross-stage interaction, coined as StageInteractor.
Our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone.
With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively.
- Score: 21.84964476813102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous object detectors make predictions based on dense grid points or
numerous preset anchors. Most of these detectors are trained with one-to-many
label assignment strategies. On the contrary, recent query-based object
detectors depend on a sparse set of learnable queries and a series of decoder
layers. The one-to-one label assignment is independently applied on each layer
for the deep supervision during training. Despite the great success of
query-based object detection, however, this one-to-one label assignment
strategy demands the detectors to have strong fine-grained discrimination and
modeling capacity. To solve the above problems, in this paper, we propose a new
query-based object detector with cross-stage interaction, coined as
StageInteractor. During the forward propagation, we come up with an efficient
way to improve this modeling ability by reusing dynamic operators with
lightweight adapters. As for the label assignment, a cross-stage label assigner
is applied subsequent to the one-to-one label assignment. With this assigner,
the training target class labels are gathered across stages and then
reallocated to proper predictions at each decoder layer. On MS COCO benchmark,
our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50
as backbone, 100 queries and 12 training epochs. With longer training time and
300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN
and Swin-S, respectively.
Related papers
- Joint Neural Networks for One-shot Object Recognition and Detection [5.389851588398047]
This paper presents a novel joint neural networks approach to address the challenging one-shot object recognition and detection tasks.
Inspired by Siamese neural networks and state-of-art multi-box detection approaches, the joint neural networks are able to perform object recognition and detection for categories that remain unseen during the training process.
The proposed approach achieves 61.41% accuracy for one-shot object recognition on the MiniImageNet dataset and 47.1% mAP for one-shot object detection when trained on the dataset and tested.
arXiv Detail & Related papers (2024-08-01T16:48:03Z) - Disentangled Pre-training for Human-Object Interaction Detection [22.653500926559833]
We propose an efficient disentangled pre-training method for HOI detection (DP-HOI)
DP-HOI utilizes object detection and action recognition datasets to pre-train the detection and interaction decoder layers.
It significantly enhances the performance of existing HOI detection models on a broad range of rare categories.
arXiv Detail & Related papers (2024-04-02T08:21:16Z) - Semi-Supervised and Long-Tailed Object Detection with CascadeMatch [91.86787064083012]
We propose a novel pseudo-labeling-based detector called CascadeMatch.
Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds.
We show that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches in handling long-tailed object detection.
arXiv Detail & Related papers (2023-05-24T07:09:25Z) - End-to-End Lane detection with One-to-Several Transformer [6.79236957488334]
O2SFormer converges 12.5x faster than DETR for the ResNet18 backbone.
O2SFormer with ResNet50 backbone achieves 77.83% F1 score on CULane dataset, outperforming existing Transformer-based and CNN-based detectors.
arXiv Detail & Related papers (2023-05-01T06:07:11Z) - Enhanced Training of Query-Based Object Detection via Selective Query
Recollection [35.3219210570517]
This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage.
We design and present Selective Query Recollection, a simple and effective training strategy for query-based object detectors.
arXiv Detail & Related papers (2022-12-15T02:45:57Z) - Label-Efficient Object Detection via Region Proposal Network
Pre-Training [58.50615557874024]
We propose a simple pretext task that provides an effective pre-training for the region proposal network (RPN)
In comparison with multi-stage detectors without RPN pre-training, our approach is able to consistently improve downstream task performance.
arXiv Detail & Related papers (2022-11-16T16:28:18Z) - AdaMixer: A Fast-Converging Query-Based Object Detector [32.159871347459166]
We propose a fast-converging query-based object detector named AdaMixer.
AdaMixer has architectural simplicity without requiring explicit pyramid networks.
Our work sheds light on a simple, accurate, and fast converging architecture for query-based object detectors.
arXiv Detail & Related papers (2022-03-30T17:45:02Z) - Corner Proposal Network for Anchor-free, Two-stage Object Detection [174.59360147041673]
The goal of object detection is to determine the class and location of objects in an image.
This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals.
We demonstrate that these two stages are effective solutions for improving recall and precision.
arXiv Detail & Related papers (2020-07-27T19:04:57Z) - AutoAssign: Differentiable Label Assignment for Dense Object Detection [94.24431503373884]
Auto COCO is an anchor-free detector for object detection.
It achieves appearance-aware through a fully differentiable weighting mechanism.
Our best model achieves 52.1% AP, outperforming all existing one-stage detectors.
arXiv Detail & Related papers (2020-07-07T14:32:21Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.