DEYO: DETR with YOLO for Step-by-Step Object Detection
- URL: http://arxiv.org/abs/2211.06588v3
- Date: Fri, 16 Jun 2023 03:49:48 GMT
- Title: DEYO: DETR with YOLO for Step-by-Step Object Detection
- Authors: Haodong Ouyang
- Abstract summary: This paper proposes a new two-stage object detection model, named DETR with YOLO (DEYO)
The first stage provides high-quality query and anchor feeding into the second stage, improving the performance and efficiency of the second stage compared to the original DETR model.
Experiments demonstrate that DEYO attains 50.6 AP and 52.1 AP in 12 and 36 epochs, respectively.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection is an important topic in computer vision, with
post-processing, an essential part of the typical object detection pipeline,
posing a significant bottleneck affecting the performance of traditional object
detection models. The detection transformer (DETR), as the first end-to-end
target detection model, discards the requirement of manual components like the
anchor and non-maximum suppression (NMS), significantly simplifying the target
detection process. However, compared with most traditional object detection
models, DETR converges very slowly, and a query's meaning is obscure. Thus,
inspired by the Step-by-Step concept, this paper proposes a new two-stage
object detection model, named DETR with YOLO (DEYO), which relies on a
progressive inference to solve the above problems. DEYO is a two-stage
architecture comprising a classic target detection model and a DETR-like model
as the first and second stages, respectively. Specifically, the first stage
provides high-quality query and anchor feeding into the second stage, improving
the performance and efficiency of the second stage compared to the original
DETR model. Meanwhile, the second stage compensates for the performance
degradation caused by the first stage detector's limitations. Extensive
experiments demonstrate that DEYO attains 50.6 AP and 52.1 AP in 12 and 36
epochs, respectively, while utilizing ResNet-50 as the backbone and multi-scale
features on the COCO dataset. Compared with DINO, an optimal DETR-like model,
the developed DEYO model affords a significant performance improvement of 1.6
AP and 1.2 AP in two epoch settings.
Related papers
- YOLO-ELA: Efficient Local Attention Modeling for High-Performance Real-Time Insulator Defect Detection [0.0]
Existing detection methods for insulator defect identification from unmanned aerial vehicles struggle with complex background scenes and small objects.
This paper proposes a new attention-based foundation architecture, YOLO-ELA, to address this issue.
Experimental results on high-resolution UAV images show that our method achieved a state-of-the-art performance of 96.9% mAP0.5 and a real-time detection speed of 74.63 frames per second.
arXiv Detail & Related papers (2024-10-15T16:00:01Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - DEYOv2: Rank Feature with Greedy Matching for End-to-End Object
Detection [0.0]
This paper presents a novel object detector called DEYOv2, an improved version of the first-generation DEYO model.
It employs a progressive reasoning approach to accelerate model training and enhance performance.
To the best of our knowledge, DEYOv2 is the first fully end-to-end object detector that combines the respective strengths of classical detectors and query-based detectors.
arXiv Detail & Related papers (2023-06-15T14:42:26Z) - D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection
with Transformers [14.488821968433834]
We propose an end-to-end framework for oriented object detection.
Our framework is based on DETR, with the box regression head replaced with a points prediction head.
Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods.
arXiv Detail & Related papers (2023-03-01T14:36:19Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Anchor Retouching via Model Interaction for Robust Object Detection in
Aerial Images [15.404024559652534]
We present an effective Dynamic Enhancement Anchor (DEA) network to construct a novel training sample generator.
Our method achieves state-of-the-art performance in accuracy with moderate inference speed and computational overhead for training.
arXiv Detail & Related papers (2021-12-13T14:37:20Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Condensing Two-stage Detection with Automatic Object Key Part Discovery [87.1034745775229]
Two-stage object detectors generally require excessively large models for their detection heads to achieve high accuracy.
We propose that the model parameters of two-stage detection heads can be condensed and reduced by concentrating on object key parts.
Our proposed technique consistently maintains original performance while waiving around 50% of the model parameters of common two-stage detection heads.
arXiv Detail & Related papers (2020-06-10T01:20:47Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.