Monocular Depth Prediction through Continuous 3D Loss
- URL: http://arxiv.org/abs/2003.09763v2
- Date: Sat, 8 Aug 2020 20:36:41 GMT
- Title: Monocular Depth Prediction through Continuous 3D Loss
- Authors: Minghan Zhu, Maani Ghaffari, Yuanxin Zhong, Pingping Lu, Zhong Cao,
Ryan M. Eustice and Huei Peng
- Abstract summary: This paper reports a new continuous 3D loss function for learning depth from monocular images.
The dense depth prediction from a monocular image is supervised using sparse LIDAR points.
Experimental evaluation shows that the proposed loss improves the depth prediction accuracy and produces point-clouds with more consistent 3D geometric structures.
- Score: 16.617016980396865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper reports a new continuous 3D loss function for learning depth from
monocular images. The dense depth prediction from a monocular image is
supervised using sparse LIDAR points, which enables us to leverage available
open source datasets with camera-LIDAR sensor suites during training.
Currently, accurate and affordable range sensor is not readily available.
Stereo cameras and LIDARs measure depth either inaccurately or sparsely/costly.
In contrast to the current point-to-point loss evaluation approach, the
proposed 3D loss treats point clouds as continuous objects; therefore, it
compensates for the lack of dense ground truth depth due to LIDAR's sparsity
measurements. We applied the proposed loss in three state-of-the-art monocular
depth prediction approaches DORN, BTS, and Monodepth2. Experimental evaluation
shows that the proposed loss improves the depth prediction accuracy and
produces point-clouds with more consistent 3D geometric structures compared
with all tested baselines, implying the benefit of the proposed loss on general
depth prediction networks. A video demo of this work is available at
https://youtu.be/5HL8BjSAY4Y.
Related papers
- A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation [46.037640130193566]
We propose a new method to rescale Depth Anything predictions using 3D points provided by sensors or techniques such as low-resolution LiDAR or structure-from-motion with poses given by an IMU.
Our experiments highlight enhancements relative to zero-shot monocular metric depth estimation methods, competitive results compared to fine-tuned approaches and a better robustness than depth completion approaches.
arXiv Detail & Related papers (2024-12-18T17:50:15Z) - Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian [49.21866794516328]
3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis.
Previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting.
We introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates.
arXiv Detail & Related papers (2024-05-30T03:18:30Z) - Depth-guided NeRF Training via Earth Mover's Distance [0.6749750044497732]
We propose a novel approach to uncertainty in depth priors for NeRF supervision.
We use off-the-shelf pretrained diffusion models to predict depth and capture uncertainty during the denoising process.
Our depth-guided NeRF outperforms all baselines on standard depth metrics by a large margin.
arXiv Detail & Related papers (2024-03-19T23:54:07Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - VR3Dense: Voxel Representation Learning for 3D Object Detection and
Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks.
It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map.
While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.