MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object
Detection
- URL: http://arxiv.org/abs/2009.11528v1
- Date: Thu, 24 Sep 2020 07:36:58 GMT
- Title: MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object
Detection
- Authors: Xin Lu, Quanquan Li, Buyu Li, Junjie Yan
- Abstract summary: One-stage detectors are more efficient owing to straightforward architectures, but the two-stage detectors still take the lead in accuracy.
We propose MimicDet, a novel framework to train a one-stage detector by directly mimicking the two-stage features.
Mimic methods have a shared backbone for one-stage and two-stage detectors, then it branches into two heads which are well designed to have compatible features for mimicking.
- Score: 65.74032877197844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern object detection methods can be divided into one-stage approaches and
two-stage ones. One-stage detectors are more efficient owing to straightforward
architectures, but the two-stage detectors still take the lead in accuracy.
Although recent work try to improve the one-stage detectors by imitating the
structural design of the two-stage ones, the accuracy gap is still significant.
In this paper, we propose MimicDet, a novel and efficient framework to train a
one-stage detector by directly mimic the two-stage features, aiming to bridge
the accuracy gap between one-stage and two-stage detectors. Unlike conventional
mimic methods, MimicDet has a shared backbone for one-stage and two-stage
detectors, then it branches into two heads which are well designed to have
compatible features for mimicking. Thus MimicDet can be end-to-end trained
without the pre-train of the teacher network. And the cost does not increase
much, which makes it practical to adopt large networks as backbones. We also
make several specialized designs such as dual-path mimicking and staggered
feature pyramid to facilitate the mimicking process. Experiments on the
challenging COCO detection benchmark demonstrate the effectiveness of MimicDet.
It achieves 46.1 mAP with ResNeXt-101 backbone on the COCO test-dev set, which
significantly surpasses current state-of-the-art methods.
Related papers
- Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images [15.12889076965307]
YOLOv7 one-stage detector is subjected to a novel meta-learning training framework.
This transformation allows the detector to adeptly address FSOD tasks while capitalizing on its inherent advantage of lightweight.
To validate the effectiveness of our proposed detector, we conducted performance comparisons with current state-of-the-art detectors.
arXiv Detail & Related papers (2024-04-29T04:56:52Z) - Can the Query-based Object Detector Be Designed with Fewer Stages? [15.726619371300558]
We propose a novel model called GOLO, which follows a two-stage decoding paradigm.
Compared to other mainstream query-based models with multi-stage decoders, our model employs fewer decoder stages while still achieving considerable performance.
arXiv Detail & Related papers (2023-09-28T09:58:52Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Towards Discriminative and Transferable One-Stage Few-Shot Object
Detectors [3.9189402702217344]
Few-shot object detection (FSOD) aims to address this problem by learning novel classes given only a few samples.
We make the observation that the large gap in performance between two-stage and one-stage FSODs are mainly due to their weak discriminability.
To address these limitations, we propose the Few-shot RetinaNet (FSRN) that consists of: a multi-way support training strategy to augment the number of foreground samples for dense meta-detectors.
arXiv Detail & Related papers (2022-10-11T20:58:25Z) - Road detection via a dual-task network based on cross-layer graph fusion
modules [2.8197257696982287]
We propose a dual-task network (DTnet) for road detection and cross-layer graph fusion module (CGM)
CGM improves the cross-layer fusion effect by a complex feature stream graph, and four graph patterns are evaluated.
arXiv Detail & Related papers (2022-08-17T07:16:55Z) - Mitigating the Mutual Error Amplification for Semi-Supervised Object
Detection [92.52505195585925]
We propose a Cross Teaching (CT) method, aiming to mitigate the mutual error amplification by introducing a rectification mechanism of pseudo labels.
In contrast to existing mutual teaching methods that directly treat predictions from other detectors as pseudo labels, we propose the Label Rectification Module (LRM)
arXiv Detail & Related papers (2022-01-26T03:34:57Z) - Mining the Benefits of Two-stage and One-stage HOI Detection [26.919979955155664]
Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years.
One-stage methods are challenging to make an appropriate trade-off on multi-task learning, i.e., object detection, and interaction classification.
We propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner.
arXiv Detail & Related papers (2021-08-11T07:38:09Z) - Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding.
However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold.
We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z) - Probabilistic two-stage detection [83.9604523643406]
We show how to build a probabilistic two-stage detector from any state-of-the-art one-stage detector.
The resulting detectors are faster and more accurate than both their one- and two-stage precursors.
arXiv Detail & Related papers (2021-03-12T18:56:17Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.