Real-Time Anchor-Free Single-Stage 3D Detection with IoU-Awareness
- URL: http://arxiv.org/abs/2107.14342v1
- Date: Thu, 29 Jul 2021 21:47:34 GMT
- Title: Real-Time Anchor-Free Single-Stage 3D Detection with IoU-Awareness
- Authors: Runzhou Ge, Zhuangzhuang Ding, Yihan Hu, Wenxin Shao, Li Huang, Kun
Li, Qiang Liu
- Abstract summary: We introduce our winning solution to the Real-time 3D Detection.
We also present the "Most Efficient Model" in the Open dataset challenges at CVPR 2021.
- Score: 15.72821609622122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we introduce our winning solution to the Real-time 3D
Detection and also the "Most Efficient Model" in the Waymo Open Dataset
Challenges at CVPR 2021. Extended from our last year's award-winning model
AFDet, we have made a handful of modifications to the base model, to improve
the accuracy and at the same time to greatly reduce the latency. The modified
model, named as AFDetV2, is featured with a lite 3D Feature Extractor, an
improved RPN with extended receptive field and an added sub-head that produces
an IoU-aware confidence score. These model enhancements, together with enriched
data augmentation, stochastic weights averaging, and a GPU-based implementation
of voxelization, lead to a winning accuracy of 73.12 mAPH/L2 for our AFDetV2
with a latency of 60.06 ms, and an accuracy of 72.57 mAPH/L2 for our
AFDetV2-base, entitled as the "Most Efficient Model" by the challenge sponsor,
with a winning latency of 55.86 ms.
Related papers
- DSFEC: Efficient and Deployable Deep Radar Object Detection [0.0]
This work explores the efficiency of Depthwise Separable Convolutions in radar object detection networks.
We introduce a novel Feature Enhancement and Compression (FEC) module to the PointPillars feature encoder to further improve the model performance.
Our deployable model achieves an impressive 74.5% reduction in runtime on the Raspberry Pi compared to the baseline.
arXiv Detail & Related papers (2024-12-10T11:03:51Z) - IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose [2.3114861820870924]
IoT-Enhanced Pose Optimization Network (IEPO-Net) for high-precisionD 3 pose estimation and motion optimization of track and field athletes.
IEPO-Net integrates C3D for extraction feature for real-time keypoint detection and hypertemporal performance tuning.
This study demonstrates superior datasets with AP+(p50) scores of 90.5 and 91.0, and mAP scores of 74.3 and 74.0, respectively.
Future work will focus on further model optimization, multimodal data integration, and developing real-time feedback mechanisms to enhance practical applications.
arXiv Detail & Related papers (2024-11-19T17:29:59Z) - Rethinking Voxelization and Classification for 3D Object Detection [68.8204255655161]
The main challenge in 3D object detection from LiDAR point clouds is achieving real-time performance without affecting the reliability of the network.
We present a solution to improve network inference speed and precision at the same time by implementing a fast dynamic voxelizer.
In addition, we propose a lightweight detection sub-head model for classifying predicted objects and filter out false detected objects.
arXiv Detail & Related papers (2023-01-10T16:22:04Z) - Optimizing Anchor-based Detectors for Autonomous Driving Scenes [22.946814647030667]
This paper summarizes model improvements and inference-time optimizations for the popular anchor-based detectors in autonomous driving scenes.
Based on the high-performing RCNN-RS and RetinaNet-RS detection frameworks, we study a set of framework improvements to adapt the detectors to better detect small objects in crowd scenes.
arXiv Detail & Related papers (2022-08-11T22:44:59Z) - Rethinking IoU-based Optimization for Single-stage 3D Object Detection [103.83141677242871]
We propose a Rotation-Decoupled IoU (RDIoU) method that can mitigate the rotation-sensitivity issue.
Our RDIoU simplifies the complex interactions of regression parameters by decoupling the rotation variable as an independent term.
arXiv Detail & Related papers (2022-07-19T15:35:23Z) - YOLOSA: Object detection based on 2D local feature superimposed
self-attention [13.307581544820248]
We propose a novel self-attention module, called 2D local feature superimposed self-attention, for the feature concatenation stage of the neck network.
Average precisions of 49.0% (66.2 FPS), 46.1% (80.6 FPS), and 39.1% (100 FPS) were obtained for large, medium, and small-scale models built using our proposed improvements.
arXiv Detail & Related papers (2022-06-23T16:49:21Z) - Distribution-Aware Single-Stage Models for Multi-Person 3D Pose
Estimation [29.430404703883084]
We present a novel Distribution-Aware Single-stage (DAS) model for tackling the challenging multi-person 3D pose estimation problem.
The proposed DAS model simultaneously localizes person positions and their corresponding body joints in the 3D camera space in a one-pass manner.
Comprehensive experiments on benchmarks CMU Panoptic and MuPoTS-3D demonstrate the superior efficiency of the proposed DAS model.
arXiv Detail & Related papers (2022-03-15T07:30:27Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Towards Fast, Accurate and Stable 3D Dense Face Alignment [73.01620081047336]
We propose a novel regression framework named 3DDFA-V2 which makes a balance among speed, accuracy and stability.
We present a virtual synthesis method to transform one still image to a short-video which incorporates in-plane and out-of-plane face moving.
arXiv Detail & Related papers (2020-09-21T15:37:37Z) - Improving 3D Object Detection through Progressive Population Based
Augmentation [91.56261177665762]
We present the first attempt to automate the design of data augmentation policies for 3D object detection.
We introduce the Progressive Population Based Augmentation (PPBA) algorithm, which learns to optimize augmentation strategies by narrowing down the search space and adopting the best parameters discovered in previous iterations.
We find that PPBA may be up to 10x more data efficient than baseline 3D detection models without augmentation, highlighting that 3D detection models may achieve competitive accuracy with far fewer labeled examples.
arXiv Detail & Related papers (2020-04-02T05:57:02Z) - Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by
Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images)
This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.