End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection
- URL: http://arxiv.org/abs/2004.03080v2
- Date: Thu, 14 May 2020 14:39:42 GMT
- Title: End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection
- Authors: Rui Qian, Divyansh Garg, Yan Wang, Yurong You, Serge Belongie, Bharath
Hariharan, Mark Campbell, Kilian Q. Weinberger, Wei-Lun Chao
- Abstract summary: Pseudo-LiDAR (PL) has led to a drastic reduction in the accuracy gap between methods based on LiDAR sensors and those based on cheap stereo cameras.
PL combines state-of-the-art deep neural networks for 3D depth estimation with those for 3D object detection by converting 2D depth map outputs to 3D point cloud inputs.
We introduce a new framework based on differentiable Change of Representation (CoR) modules that allow the entire PL pipeline to be trained end-to-end.
- Score: 62.34374949726333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable and accurate 3D object detection is a necessity for safe autonomous
driving. Although LiDAR sensors can provide accurate 3D point cloud estimates
of the environment, they are also prohibitively expensive for many settings.
Recently, the introduction of pseudo-LiDAR (PL) has led to a drastic reduction
in the accuracy gap between methods based on LiDAR sensors and those based on
cheap stereo cameras. PL combines state-of-the-art deep neural networks for 3D
depth estimation with those for 3D object detection by converting 2D depth map
outputs to 3D point cloud inputs. However, so far these two networks have to be
trained separately. In this paper, we introduce a new framework based on
differentiable Change of Representation (CoR) modules that allow the entire PL
pipeline to be trained end-to-end. The resulting framework is compatible with
most state-of-the-art networks for both tasks and in combination with PointRCNN
improves over PL consistently across all benchmarks -- yielding the highest
entry on the KITTI image-based 3D object detection leaderboard at the time of
submission. Our code will be made available at
https://github.com/mileyan/pseudo-LiDAR_e2e.
Related papers
- SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception [3.627834388176496]
SpotNet is a fast, single stage, image-centric but LiDAR anchored approach for long range 3D object detection.
We demonstrate that our approach to LiDAR/image sensor fusion, combined with the joint learning of 2D and 3D detection tasks, can lead to accurate 3D object detection with very sparse LiDAR support.
arXiv Detail & Related papers (2024-05-24T17:25:48Z) - Fully Sparse Fusion for 3D Object Detection [69.32694845027927]
Currently prevalent multimodal 3D detection methods are built upon LiDAR-based detectors that usually use dense Bird's-Eye-View feature maps.
Fully sparse architecture is gaining attention as they are highly efficient in long-range perception.
In this paper, we study how to effectively leverage image modality in the emerging fully sparse architecture.
arXiv Detail & Related papers (2023-04-24T17:57:43Z) - SM3D: Simultaneous Monocular Mapping and 3D Detection [1.2183405753834562]
We present an innovative and efficient multi-task deep learning framework (SM3D) for Simultaneous Mapping and 3D Detection.
By end-to-end training of both modules, the proposed mapping and 3D detection method outperforms the state-of-the-art baseline by 10.0% and 13.2% in accuracy.
Our monocular multi-task SM3D is more than 2 times faster than pure stereo 3D detector, and 18.3% faster than using two modules separately.
arXiv Detail & Related papers (2021-11-24T17:23:37Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots.
Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation.
We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z) - Deep Continuous Fusion for Multi-Sensor 3D Object Detection [103.5060007382646]
We propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.
We design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution.
arXiv Detail & Related papers (2020-12-20T18:43:41Z) - RoIFusion: 3D Object Detection from LiDAR and Vision [7.878027048763662]
We propose a novel fusion algorithm by projecting a set of 3D Region of Interests (RoIs) from the point clouds to the 2D RoIs of the corresponding the images.
Our approach achieves state-of-the-art performance on the KITTI 3D object detection challenging benchmark.
arXiv Detail & Related papers (2020-09-09T20:23:27Z) - Rethinking Pseudo-LiDAR Representation [70.29791705160203]
We propose an image based CNN detector named Patch-Net, which is more generalized and can be instantiated as pseudo-LiDAR based 3D detectors.
We conduct extensive experiments on the challenging KITTI dataset, where the proposed PatchNet outperforms all existing pseudo-LiDAR based counterparts.
arXiv Detail & Related papers (2020-08-11T08:44:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.