Unsupervised Depth Completion with Calibrated Backprojection Layers
- URL: http://arxiv.org/abs/2108.10531v1
- Date: Tue, 24 Aug 2021 05:41:59 GMT
- Title: Unsupervised Depth Completion with Calibrated Backprojection Layers
- Authors: Alex Wong and Stefano Soatto
- Abstract summary: We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud.
It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera.
At inference time, the calibration of the camera, which can be different from the one used for training, is fed as an input to the network along with the sparse point cloud and a single image.
- Score: 79.35651668390496
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a deep neural network architecture to infer dense depth from an
image and a sparse point cloud. It is trained using a video stream and
corresponding synchronized sparse point cloud, as obtained from a LIDAR or
other range sensor, along with the intrinsic calibration parameters of the
camera. At inference time, the calibration of the camera, which can be
different than the one used for training, is fed as an input to the network
along with the sparse point cloud and a single image. A Calibrated
Backprojection Layer backprojects each pixel in the image to three-dimensional
space using the calibration matrix and a depth feature descriptor. The
resulting 3D positional encoding is concatenated with the image descriptor and
the previous layer output to yield the input to the next layer of the encoder.
A decoder, exploiting skip-connections, produces a dense depth map. The
resulting Calibrated Backprojection Network, or KBNet, is trained without
supervision by minimizing the photometric reprojection error. KBNet imputes
missing depth value based on the training set, rather than on generic
regularization. We test KBNet on public depth completion benchmarks, where it
outperforms the state of the art by 30% indoor and 8% outdoor when the same
camera is used for training and testing. When the test camera is different, the
improvement reaches 62%. Code available at:
https://github.com/alexklwong/calibrated-backprojection-network.
Related papers
- Time and Cost-Efficient Bathymetric Mapping System using Sparse Point
Cloud Generation and Automatic Object Detection [0.0]
Side-scan sonar sensors are available in inexpensive cost ranges, especially in fish-finders.
Extracting 3D information from side-scan sonar imagery is a difficult task because of its low signal-to-noise ratio.
This paper introduces an efficient algorithm that generates a sparse 3D point cloud from side-scan sonar images.
arXiv Detail & Related papers (2022-10-19T02:58:08Z) - DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z) - A Low Memory Footprint Quantized Neural Network for Depth Completion of
Very Sparse Time-of-Flight Depth Maps [14.885472968649937]
We simulate ToF datasets for indoor 3D perception with challenging sparsity levels.
Our model achieves optimal depth map quality by means of input pre-processing and carefully tuned training.
We also achieve low memory footprint for weights and activations by means of mixed precision quantization-at-training techniques.
arXiv Detail & Related papers (2022-05-25T17:11:31Z) - Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement [47.61748619439693]
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints.
Previous works denoise a point cloud textita posteriori after projecting the imperfect depth data onto 3D space.
We enhance depth measurements directly on the sensed images textita priori, before synthesizing a 3D point cloud.
arXiv Detail & Related papers (2021-11-09T04:17:35Z) - Single image deep defocus estimation and its applications [82.93345261434943]
We train a deep neural network to classify image patches into one of the 20 levels of blurriness.
The trained model is used to determine the patch blurriness which is then refined by applying an iterative weighted guided filter.
The result is a defocus map that carries the information of the degree of blurriness for each pixel.
arXiv Detail & Related papers (2021-07-30T06:18:16Z) - DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud.
Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar.
We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - ODE-CNN: Omnidirectional Depth Extension Networks [43.40308168978984]
We propose a low-cost 3D sensing system that combines an omnidirectional camera with a calibrated projective depth camera.
To accurately recover the missing depths, we design an omnidirectional depth extension convolutional neural network.
ODE-CNN significantly outperforms (relatively 33% reduction in-depth error) other state-of-the-art (SoTA) methods.
arXiv Detail & Related papers (2020-07-03T03:14:09Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.