RTMDet: An Empirical Study of Designing Real-Time Object Detectors
- URL: http://arxiv.org/abs/2212.07784v1
- Date: Wed, 14 Dec 2022 18:50:20 GMT
- Title: RTMDet: An Empirical Study of Designing Real-Time Object Detectors
- Authors: Chengqi Lyu, Wenwei Zhang, Haian Huang, Yue Zhou, Yudong Wang, Yanyi
Liu, Shilong Zhang, Kai Chen
- Abstract summary: We develop an efficient real-time object detector that exceeds the YOLO series and is easily for many object recognition tasks.
Together with better training techniques, the resulting object detector achieves, named RTMDet, 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU.
We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks.
- Score: 13.09100888887757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we aim to design an efficient real-time object detector that
exceeds the YOLO series and is easily extensible for many object recognition
tasks such as instance segmentation and rotated object detection. To obtain a
more efficient model architecture, we explore an architecture that has
compatible capacities in the backbone and neck, constructed by a basic building
block that consists of large-kernel depth-wise convolutions. We further
introduce soft labels when calculating matching costs in the dynamic label
assignment to improve accuracy. Together with better training techniques, the
resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+
FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial
detectors. RTMDet achieves the best parameter-accuracy trade-off with
tiny/small/medium/large/extra-large model sizes for various application
scenarios, and obtains new state-of-the-art performance on real-time instance
segmentation and rotated object detection. We hope the experimental results can
provide new insights into designing versatile real-time object detectors for
many object recognition tasks. Code and models are released at
https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet.
Related papers
- What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector [0.0]
This study focuses on the YOLOv9 object detection model, focusing on its architectural innovations, training methodologies, and performance improvements.
Key advancements, such as the Generalized Efficient Layer Aggregation Network GELAN and Programmable Gradient Information PGI, significantly enhance feature extraction and gradient flow.
This paper provides the first in depth exploration of YOLOv9s internal features and their real world applicability, establishing it as a state of the art solution for real time object detection.
arXiv Detail & Related papers (2024-09-12T07:46:58Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - The Impact of Different Backbone Architecture on Autonomous Vehicle
Dataset [120.08736654413637]
The quality of the features extracted by the backbone architecture can have a significant impact on the overall detection performance.
Our study evaluates three well-known autonomous vehicle datasets, namely KITTI, NuScenes, and BDD, to compare the performance of different backbone architectures on object detection tasks.
arXiv Detail & Related papers (2023-09-15T17:32:15Z) - PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR
Point Clouds [29.15589024703907]
In this paper, we revisit the local point aggregators from the perspective of allocating computational resources.
We find that the simplest pillar based models perform surprisingly well considering both accuracy and latency.
Our results challenge the common intuition that the detailed geometry modeling is essential to achieve high performance for 3D object detection.
arXiv Detail & Related papers (2023-05-08T17:59:14Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - Analysis of voxel-based 3D object detection methods efficiency for
real-time embedded systems [93.73198973454944]
Two popular voxel-based 3D object detection methods are studied in this paper.
Our experiments show that these methods mostly fail to detect distant small objects due to the sparsity of the input point clouds at large distances.
Our findings suggest that a considerable part of the computations of existing methods is focused on locations of the scene that do not contribute with successful detection.
arXiv Detail & Related papers (2021-05-21T12:40:59Z) - Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors.
Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf)
For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.