Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance
Disparity Estimation
- URL: http://arxiv.org/abs/2004.03572v1
- Date: Tue, 7 Apr 2020 17:48:45 GMT
- Title: Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance
Disparity Estimation
- Authors: Jiaming Sun, Linghao Chen, Yiming Xie, Siyu Zhang, Qinhong Jiang,
Xiaowei Zhou, Hujun Bao
- Abstract summary: We propose a novel system named Disp R-CNN for 3D object detection from stereo images.
We use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds.
Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
- Score: 51.17232267143098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel system named Disp R-CNN for 3D object
detection from stereo images. Many recent works solve this problem by first
recovering a point cloud with disparity estimation and then apply a 3D
detector. The disparity map is computed for the entire image, which is costly
and fails to leverage category-specific prior. In contrast, we design an
instance disparity estimation network (iDispNet) that predicts disparity only
for pixels on objects of interest and learns a category-specific shape prior
for more accurate disparity estimation. To address the challenge from scarcity
of disparity annotation in training, we propose to use a statistical shape
model to generate dense disparity pseudo-ground-truth without the need of LiDAR
point clouds, which makes our system more widely applicable. Experiments on the
KITTI dataset show that, even when LiDAR ground-truth is not available at
training time, Disp R-CNN achieves competitive performance and outperforms
previous state-of-the-art methods by 20% in terms of average precision.
Related papers
- DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection [6.096961718434965]
We study the problem of semi-supervised 3D object detection, which is of great importance considering the high annotation cost for cluttered 3D indoor scenes.
We resort to the robust and principled framework of selfteaching, which has triggered notable progress for semisupervised learning recently.
We propose the first semisupervised 3D detection algorithm that works in the singlestage manner and allows spatially dense training signals.
arXiv Detail & Related papers (2023-04-25T17:59:54Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - A Closer Look at Invariances in Self-supervised Pre-training for 3D
Vision [0.0]
Self-supervised pre-training for 3D vision has drawn increasing research interest in recent years.
We present a unified framework under which various pre-training methods can be investigated.
We propose a simple but effective method that jointly pre-trains a 3D encoder and a depth map encoder using contrastive learning.
arXiv Detail & Related papers (2022-07-11T16:44:15Z) - Uncertainty-Aware Camera Pose Estimation from Points and Lines [101.03675842534415]
Perspective-n-Point-and-Line (Pn$PL) aims at fast, accurate and robust camera localizations with respect to a 3D model from 2D-3D feature coordinates.
arXiv Detail & Related papers (2021-07-08T15:19:36Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Delving into Localization Errors for Monocular 3D Object Detection [85.77319416168362]
Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving.
In this work, we quantify the impact introduced by each sub-task and find the localization error' is the vital factor in restricting monocular 3D detection.
arXiv Detail & Related papers (2021-03-30T10:38:01Z) - PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View
Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net.
Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z) - Probabilistic Vehicle Reconstruction Using a Multi-Task CNN [0.0]
We present a probabilistic approach for shape-aware 3D vehicle reconstruction from stereo images.
Specifically, we train a CNN that outputs probability distributions for the vehicle's orientation and for both, vehicle keypoints and wireframe edges.
We show that our method achieves state-of-the-art results, evaluating our method on the challenging KITTI benchmark.
arXiv Detail & Related papers (2021-02-21T20:45:44Z) - Learning to Predict the 3D Layout of a Scene [0.3867363075280544]
We propose a method that only uses a single RGB image, thus enabling applications in devices or vehicles that do not have LiDAR sensors.
We use the KITTI dataset for training, which consists of street traffic scenes with class labels, 2D bounding boxes and 3D annotations with seven degrees of freedom.
We achieve a mean average precision of 47.3% for moderately difficult data, measured at a 3D intersection over union threshold of 70%, as required by the official KITTI benchmark; outperforming previous state-of-the-art single RGB only methods by a large margin.
arXiv Detail & Related papers (2020-11-19T17:23:30Z) - Deep Learning on Point Clouds for False Positive Reduction at Nodule
Detection in Chest CT Scans [0.0]
This paper focuses on a novel approach for false-positive reduction (FPR) of nodule candidates inCADe systems.
The proposed approach considers input data not as a 2D or 3D image, but as a point cloud, and uses deep learning models for point clouds.
We show that the proposed approach outperforms baseline CNN 3D models and resulted in 85.98 FROC versus 77.26 FROC for baseline models.
arXiv Detail & Related papers (2020-05-07T17:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.