DEYOv2: Rank Feature with Greedy Matching for End-to-End Object
Detection
- URL: http://arxiv.org/abs/2306.09165v2
- Date: Mon, 3 Jul 2023 01:52:45 GMT
- Title: DEYOv2: Rank Feature with Greedy Matching for End-to-End Object
Detection
- Authors: Haodong Ouyang
- Abstract summary: This paper presents a novel object detector called DEYOv2, an improved version of the first-generation DEYO model.
It employs a progressive reasoning approach to accelerate model training and enhance performance.
To the best of our knowledge, DEYOv2 is the first fully end-to-end object detector that combines the respective strengths of classical detectors and query-based detectors.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel object detector called DEYOv2, an improved
version of the first-generation DEYO (DETR with YOLO) model. DEYOv2, similar to
its predecessor, DEYOv2 employs a progressive reasoning approach to accelerate
model training and enhance performance. The study delves into the limitations
of one-to-one matching in optimization and proposes solutions to effectively
address the issue, such as Rank Feature and Greedy Matching. This approach
enables the third stage of DEYOv2 to maximize information acquisition from the
first and second stages without needing NMS, achieving end-to-end optimization.
By combining dense queries, sparse queries, one-to-many matching, and
one-to-one matching, DEYOv2 leverages the advantages of each method. It
outperforms all existing query-based end-to-end detectors under the same
settings. When using ResNet-50 as the backbone and multi-scale features on the
COCO dataset, DEYOv2 achieves 51.1 AP and 51.8 AP in 12 and 24 epochs,
respectively. Compared to the end-to-end model DINO, DEYOv2 provides
significant performance gains of 2.1 AP and 1.4 AP in the two epoch settings.
To the best of our knowledge, DEYOv2 is the first fully end-to-end object
detector that combines the respective strengths of classical detectors and
query-based detectors.
Related papers
- E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection [21.185032466325737]
We introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection.
E2E-MFD streamlines the process, achieving high performance with a single training phase.
Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities.
arXiv Detail & Related papers (2024-03-14T12:12:17Z) - DEYO: DETR with YOLO for Step-by-Step Object Detection [0.0]
This paper proposes a new two-stage object detection model, named DETR with YOLO (DEYO)
The first stage provides high-quality query and anchor feeding into the second stage, improving the performance and efficiency of the second stage compared to the original DETR model.
Experiments demonstrate that DEYO attains 50.6 AP and 52.1 AP in 12 and 36 epochs, respectively.
arXiv Detail & Related papers (2022-11-12T06:36:17Z) - YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object
Keypoint Similarity Loss [1.3381749415517017]
YOLO-pose is a novel heatmap-free approach for joint detection and 2D multi-person pose estimation.
Our framework allows us to train the model end-to-end and optimize the Object Keypoint Similarity (OKS) metric itself.
YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50)
arXiv Detail & Related papers (2022-04-14T08:02:40Z) - Benchmarking Deep Models for Salient Object Detection [67.07247772280212]
We construct a general SALient Object Detection (SALOD) benchmark to conduct a comprehensive comparison among several representative SOD methods.
In the above experiments, we find that existing loss functions usually specialized in some metrics but reported inferior results on the others.
We propose a novel Edge-Aware (EA) loss that promotes deep networks to learn more discriminative features by integrating both pixel- and image-level supervision signals.
arXiv Detail & Related papers (2022-02-07T03:43:16Z) - AFDetV2: Rethinking the Necessity of the Second Stage for Object
Detection from Point Clouds [15.72821609622122]
We develop a single-stage anchor-free network for 3D detection from point clouds.
We use a self-calibrated convolution block in the backbone, a keypoint auxiliary supervision, and an IoU prediction branch in the multi-task head.
We win the 1st place in the Real-Time 3D Challenge 2021.
arXiv Detail & Related papers (2021-12-16T21:22:17Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding.
However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold.
We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - Corner Proposal Network for Anchor-free, Two-stage Object Detection [174.59360147041673]
The goal of object detection is to determine the class and location of objects in an image.
This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals.
We demonstrate that these two stages are effective solutions for improving recall and precision.
arXiv Detail & Related papers (2020-07-27T19:04:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.