PLUME: Efficient 3D Object Detection from Stereo Images
- URL: http://arxiv.org/abs/2101.06594v2
- Date: Thu, 11 Mar 2021 19:32:16 GMT
- Title: PLUME: Efficient 3D Object Detection from Stereo Images
- Authors: Yan Wang, Bin Yang, Rui Hu, Ming Liang, Raquel Urtasun
- Abstract summary: Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
- Score: 95.31278688164646
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 3D object detection plays a significant role in various robotic applications
including self-driving. While many approaches rely on expensive 3D sensors like
LiDAR to produce accurate 3D estimates, stereo-based methods have recently
shown promising results at a lower cost. Existing methods tackle the problem in
two steps: first depth estimation is performed, a pseudo LiDAR point cloud
representation is computed from the depth estimates, and then object detection
is performed in 3D space. However, because the two separate tasks are optimized
in different metric spaces, the depth estimation is biased towards nearby
objects and may cause sub-optimal performance of 3D detection. In this paper we
propose a model that unifies these two tasks in the same metric space.
Specifically, our model directly constructs a pseudo LiDAR feature volume
(PLUME) in 3D space, which is used to solve both occupancy estimation and
object detection tasks. Our approach achieves state-of-the-art performance on
the challenging KITTI benchmark, with significantly reduced inference time
compared with existing methods.
Related papers
- SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection [19.75965521357068]
We propose a novel approach called SOGDet (Semantic-Occupancy Guided Multi-view 3D Object Detection) to improve the accuracy of 3D object detection.
Our results show that SOGDet consistently enhance the performance of three baseline methods in terms of nuScenes Detection Score (NDS) and mean Average Precision (mAP)
This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.
arXiv Detail & Related papers (2023-08-26T07:38:21Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - SM3D: Simultaneous Monocular Mapping and 3D Detection [1.2183405753834562]
We present an innovative and efficient multi-task deep learning framework (SM3D) for Simultaneous Mapping and 3D Detection.
By end-to-end training of both modules, the proposed mapping and 3D detection method outperforms the state-of-the-art baseline by 10.0% and 13.2% in accuracy.
Our monocular multi-task SM3D is more than 2 times faster than pure stereo 3D detector, and 18.3% faster than using two modules separately.
arXiv Detail & Related papers (2021-11-24T17:23:37Z) - DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries [43.02373021724797]
We introduce a framework for multi-camera 3D object detection.
Our method manipulates predictions directly in 3D space.
We achieve state-of-the-art performance on the nuScenes autonomous driving benchmark.
arXiv Detail & Related papers (2021-10-13T17:59:35Z) - FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle
Detection [81.79171905308827]
We propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.
Our method consists of two stages: coarse 3D segmentation and 3D bounding box estimation.
It is able to accurately detect objects in 3D space with only 2D bounding boxes and sparse point clouds.
arXiv Detail & Related papers (2021-05-17T07:29:55Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint
Estimation [3.1542695050861544]
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving.
We propose a novel 3D object detection method, named SMOKE, that combines a single keypoint estimate with regressed 3D variables.
Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset.
arXiv Detail & Related papers (2020-02-24T08:15:36Z) - DSGN: Deep Stereo Geometry Network for 3D Object Detection [79.16397166985706]
There is a large performance gap between image-based and LiDAR-based 3D object detectors.
Our method, called Deep Stereo Geometry Network (DSGN), significantly reduces this gap.
For the first time, we provide a simple and effective one-stage stereo-based 3D detection pipeline.
arXiv Detail & Related papers (2020-01-10T11:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.