Related papers: VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction

VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction

URL: http://arxiv.org/abs/2104.05932v1
Date: Tue, 13 Apr 2021 04:25:54 GMT
Title: VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction
Authors: Shubham Shrivastava
Abstract summary: We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks. It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map. While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
Score: 0.951828574518325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D object detection and dense depth estimation are one of the most vital tasks in autonomous driving. Multiple sensor modalities can jointly attribute towards better robot perception, and to that end, we introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks. It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map. LiDAR point-cloud is converted into a set of voxels, and its features are extracted using 3D convolution layers, from which we regress object pose parameters. Corresponding RGB image features are extracted using another 2D convolutional neural network. We further use these combined features to predict a dense depth map. While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions. We also introduce a loss function, edge-preserving smooth loss, and show that this results in better depth estimation compared to the edge-aware smooth loss function, frequently used in depth prediction works.

Related papers

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning. We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z)
Perspective-aware Convolution for Monocular 3D Object Detection [2.33877878310217]
We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images. By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture. We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
arXiv Detail & Related papers (2023-08-24T17:25:36Z)
Boosting Monocular 3D Object Detection with Object-Centric Auxiliary Depth Supervision [13.593246617391266]
We propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task. Our novel object-centric depth prediction loss focuses on depth around foreground objects, which is important for 3D object detection. Our depth regression model is further trained to predict the uncertainty of depth to represent the 3D confidence of objects.
arXiv Detail & Related papers (2022-10-29T11:32:28Z)
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection [10.377424252002792]
monocular 3D object detection lacks accurate depth recovery ability. Deep neural network (DNN) enables monocular depth-sensing from high-level learned features. We propose a joint semantic and geometric cost volume to model the depth error.
arXiv Detail & Related papers (2022-03-16T11:54:10Z)
Sparse Depth Completion with Semantic Mesh Deformation Optimization [4.03103540543081]
We propose a neural network with post-optimization, which takes an RGB image and sparse depth samples as input and predicts the complete depth map. Our evaluation results outperform the existing work consistently on both indoor and outdoor datasets.
arXiv Detail & Related papers (2021-12-10T13:01:06Z)
Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction. By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation. We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z)
Ground-aware Monocular 3D Object Detection for Autonomous Driving [6.5702792909006735]
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a challenging task for low-cost urban autonomous driving and mobile robots. Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation. We introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning.
arXiv Detail & Related papers (2021-02-01T08:18:24Z)
Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z)
Expandable YOLO: 3D Object Detection from RGB-D Images [64.14512458954344]
This paper aims at constructing a light-weight object detector that inputs a depth and a color image from a stereo camera. By extending the network architecture of YOLOv3 to 3D in the middle, it is possible to output in the depth direction. Intersection over Uninon (IoU) in 3D space is introduced to confirm the accuracy of region extraction results.
arXiv Detail & Related papers (2020-06-26T07:32:30Z)
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features [51.04841465193678]
We leverage a 3D fully convolutional network for 3D point clouds. We propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. Our method achieves state-of-the-art results in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2020-03-06T12:51:09Z)
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data. IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.