MonoPCNS: Monocular 3D Object Detection via Point Cloud Network
Simulation
- URL: http://arxiv.org/abs/2208.09446v1
- Date: Fri, 19 Aug 2022 16:57:11 GMT
- Title: MonoPCNS: Monocular 3D Object Detection via Point Cloud Network
Simulation
- Authors: Han Sun, Zhaoxin Fan, Zhenbo Song, Zhicheng Wang, Kejian Wu, Jianfeng
Lu
- Abstract summary: Existing leading methods tend to estimate the depth of the input image first, and detect the 3D object based on point cloud.
MonoPCNS is proposed to simulate the feature learning behavior of a point cloud based detector for monocular detector during the training period.
Our method consistently improves the performance of different monocular detectors for a large margin without changing their network architectures.
- Score: 16.237400933896886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D object detection is a fundamental but very important task to
many applications including autonomous driving, robotic grasping and augmented
reality. Existing leading methods tend to estimate the depth of the input image
first, and detect the 3D object based on point cloud. This routine suffers from
the inherent gap between depth estimation and object detection. Besides, the
prediction error accumulation would also affect the performance. In this paper,
a novel method named MonoPCNS is proposed. The insight behind introducing
MonoPCNS is that we propose to simulate the feature learning behavior of a
point cloud based detector for monocular detector during the training period.
Hence, during inference period, the learned features and prediction would be
similar to the point cloud based detector as possible. To achieve it, we
propose one scene-level simulation module, one RoI-level simulation module and
one response-level simulation module, which are progressively used for the
detector's full feature learning and prediction pipeline. We apply our method
to the famous M3D-RPN detector and CaDDN detector, conducting extensive
experiments on KITTI and Waymo Open dataset. Results show that our method
consistently improves the performance of different monocular detectors for a
large margin without changing their network architectures. Our method finally
achieves state-of-the-art performance.
Related papers
- VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics.
In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Weakly Supervised Monocular 3D Object Detection using Multi-View
Projection and Direction Consistency [78.76508318592552]
Monocular 3D object detection has become a mainstream approach in automatic driving for its easy application.
Most current methods still rely on 3D point cloud data for labeling the ground truths used in the training phase.
We propose a new weakly supervised monocular 3D objection detection method, which can train the model with only 2D labels marked on images.
arXiv Detail & Related papers (2023-03-15T15:14:00Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Boosting 3D Object Detection by Simulating Multimodality on Point Clouds [51.87740119160152]
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector.
The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference.
Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors.
arXiv Detail & Related papers (2022-06-30T01:44:30Z) - MDS-Net: A Multi-scale Depth Stratification Based Monocular 3D Object
Detection Algorithm [4.958840734249869]
This paper proposes a one-stage monocular 3D object detection algorithm based on multi-scale depth stratification.
Experiments on the KITTI benchmark show that the MDS-Net outperforms the existing monocular 3D detection methods in 3D detection and BEV detection tasks.
arXiv Detail & Related papers (2022-01-12T07:11:18Z) - The Devil is in the Task: Exploiting Reciprocal Appearance-Localization
Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving.
We introduce a Dynamic Feature Reflecting Network, named DFR-Net.
We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z) - Object DGCNN: 3D Object Detection using Dynamic Graphs [32.090268859180334]
3D object detection often involves complicated training and testing pipelines.
Inspired by recent non-maximum suppression-free 2D object detection models, we propose a 3D object detection architecture on point clouds.
arXiv Detail & Related papers (2021-10-13T17:59:38Z) - IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a
Single Image [37.83574424518901]
3D object detection from a single image is an important task in Autonomous Driving.
We propose an instance-aware approach to aggregate useful information for improving the accuracy of 3D object detection.
arXiv Detail & Related papers (2021-03-05T05:47:52Z) - SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection.
We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors.
Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.