AFDetV2: Rethinking the Necessity of the Second Stage for Object
Detection from Point Clouds
- URL: http://arxiv.org/abs/2112.09205v1
- Date: Thu, 16 Dec 2021 21:22:17 GMT
- Title: AFDetV2: Rethinking the Necessity of the Second Stage for Object
Detection from Point Clouds
- Authors: Yihan Hu, Zhuangzhuang Ding, Runzhou Ge, Wenxin Shao, Li Huang, Kun
Li, Qiang Liu
- Abstract summary: We develop a single-stage anchor-free network for 3D detection from point clouds.
We use a self-calibrated convolution block in the backbone, a keypoint auxiliary supervision, and an IoU prediction branch in the multi-task head.
We win the 1st place in the Real-Time 3D Challenge 2021.
- Score: 15.72821609622122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There have been two streams in the 3D detection from point clouds:
single-stage methods and two-stage methods. While the former is more
computationally efficient, the latter usually provides better detection
accuracy. By carefully examining the two-stage approaches, we have found that
if appropriately designed, the first stage can produce accurate box regression.
In this scenario, the second stage mainly rescores the boxes such that the
boxes with better localization get selected. From this observation, we have
devised a single-stage anchor-free network that can fulfill these requirements.
This network, named AFDetV2, extends the previous work by incorporating a
self-calibrated convolution block in the backbone, a keypoint auxiliary
supervision, and an IoU prediction branch in the multi-task head. As a result,
the detection accuracy is drastically boosted in the single-stage. To evaluate
our approach, we have conducted extensive experiments on the Waymo Open Dataset
and the nuScenes Dataset. We have observed that our AFDetV2 achieves the
state-of-the-art results on these two datasets, superior to all the prior arts,
including both the single-stage and the two-stage se3D detectors. AFDetV2 won
the 1st place in the Real-Time 3D Detection of the Waymo Open Dataset Challenge
2021. In addition, a variant of our model AFDetV2-Base was entitled the "Most
Efficient Model" by the Challenge Sponsor, showing a superior computational
efficiency. To demonstrate the generality of this single-stage method, we have
also applied it to the first stage of the two-stage networks. Without
exception, the results show that with the strengthened backbone and the
rescoring approach, the second stage refinement is no longer needed.
Related papers
- Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance.
Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets.
In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z) - DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection [6.096961718434965]
We study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes.
We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently.
We propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals.
arXiv Detail & Related papers (2023-04-25T17:59:54Z) - Occlusion-Robust Object Pose Estimation with Holistic Representation [42.27081423489484]
State-of-the-art (SOTA) object pose estimators take a two-stage approach.
We develop a novel occlude-and-blackout batch augmentation technique.
We also develop a multi-precision supervision architecture to encourage holistic pose representation learning.
arXiv Detail & Related papers (2021-10-22T08:00:26Z) - MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object Detection [17.295359521427073]
We propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection.
In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion modules to produce cross-modal fusion features from single-modal semantic features.
In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement.
arXiv Detail & Related papers (2021-08-29T15:40:15Z) - PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector
Representation for 3D Object Detection [100.60209139039472]
We propose the PointVoxel Region based Convolution Neural Networks (PVRCNNs) for accurate 3D detection from point clouds.
Our proposed PV-RCNNs significantly outperform previous state-of-the-art 3D detection methods on both the Open dataset and the highly-competitive KITTI benchmark.
arXiv Detail & Related papers (2021-01-31T14:51:49Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z) - 3DSSD: Point-based 3D Single Stage Object Detector [61.67928229961813]
We present a point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency.
Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well.
arXiv Detail & Related papers (2020-02-24T12:01:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.