Is Pseudo-Lidar needed for Monocular 3D Object detection?
- URL: http://arxiv.org/abs/2108.06417v1
- Date: Fri, 13 Aug 2021 22:22:51 GMT
- Title: Is Pseudo-Lidar needed for Monocular 3D Object detection?
- Authors: Dennis Park, Rares Ambrus, Vitor Guizilini, Jie Li, Adrien Gaidon
- Abstract summary: We propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations.
Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data.
- Score: 32.772699246216774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in 3D object detection from single images leverages monocular
depth estimation as a way to produce 3D pointclouds, turning cameras into
pseudo-lidar sensors. These two-stage detectors improve with the accuracy of
the intermediate depth estimation network, which can itself be improved without
manual labels via large-scale self-supervised learning. However, they tend to
suffer from overfitting more than end-to-end methods, are more complex, and the
gap with similar lidar-based detectors remains significant. In this work, we
propose an end-to-end, single stage, monocular 3D object detector, DD3D, that
can benefit from depth pre-training like pseudo-lidar methods, but without
their limitations. Our architecture is designed for effective information
transfer between depth estimation and 3D detection, allowing us to scale with
the amount of unlabeled pre-training data. Our method achieves state-of-the-art
results on two challenging benchmarks, with 16.34% and 9.28% AP for Cars and
Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on
NuScenes.
Related papers
- Toward Accurate Camera-based 3D Object Detection via Cascade Depth
Estimation and Calibration [20.82054596017465]
Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces.
This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization.
arXiv Detail & Related papers (2024-02-07T14:21:26Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - SM3D: Simultaneous Monocular Mapping and 3D Detection [1.2183405753834562]
We present an innovative and efficient multi-task deep learning framework (SM3D) for Simultaneous Mapping and 3D Detection.
By end-to-end training of both modules, the proposed mapping and 3D detection method outperforms the state-of-the-art baseline by 10.0% and 13.2% in accuracy.
Our monocular multi-task SM3D is more than 2 times faster than pure stereo 3D detector, and 18.3% faster than using two modules separately.
arXiv Detail & Related papers (2021-11-24T17:23:37Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - Geometry-aware data augmentation for monocular 3D object detection [18.67567745336633]
This paper focuses on monocular 3D object detection, one of the essential modules in autonomous driving systems.
A key challenge is that the depth recovery problem is ill-posed in monocular data.
We conduct a thorough analysis to reveal how existing methods fail to robustly estimate depth when different geometry shifts occur.
We convert the aforementioned manipulations into four corresponding 3D-aware data augmentation techniques.
arXiv Detail & Related papers (2021-04-12T23:12:48Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Boundary-Aware Dense Feature Indicator for Single-Stage 3D Object
Detection from Point Clouds [32.916690488130506]
We propose a universal module that helps 3D detectors focus on the densest region of the point clouds in a boundary-aware manner.
Experiments on KITTI dataset show that DENFI improves the performance of the baseline single-stage detector remarkably.
arXiv Detail & Related papers (2020-04-01T01:21:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.