Related papers: Unsupervised Depth Completion with Calibrated Backprojection Layers

Unsupervised Depth Completion with Calibrated Backprojection Layers

URL: http://arxiv.org/abs/2108.10531v1
Date: Tue, 24 Aug 2021 05:41:59 GMT
Title: Unsupervised Depth Completion with Calibrated Backprojection Layers
Authors: Alex Wong and Stefano Soatto
Abstract summary: We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud. It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera. At inference time, the calibration of the camera, which can be different from the one used for training, is fed as an input to the network along with the sparse point cloud and a single image.
Score: 79.35651668390496
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud. It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera. At inference time, the calibration of the camera, which can be different than the one used for training, is fed as an input to the network along with the sparse point cloud and a single image. A Calibrated Backprojection Layer backprojects each pixel in the image to three-dimensional space using the calibration matrix and a depth feature descriptor. The resulting 3D positional encoding is concatenated with the image descriptor and the previous layer output to yield the input to the next layer of the encoder. A decoder, exploiting skip-connections, produces a dense depth map. The resulting Calibrated Backprojection Network, or KBNet, is trained without supervision by minimizing the photometric reprojection error. KBNet imputes missing depth value based on the training set, rather than on generic regularization. We test KBNet on public depth completion benchmarks, where it outperforms the state of the art by 30% indoor and 8% outdoor when the same camera is used for training and testing. When the test camera is different, the improvement reaches 62%. Code available at: https://github.com/alexklwong/calibrated-backprojection-network.

Related papers

Time and Cost-Efficient Bathymetric Mapping System using Sparse Point Cloud Generation and Automatic Object Detection [0.0]
Side-scan sonar sensors are available in inexpensive cost ranges, especially in fish-finders. Extracting 3D information from side-scan sonar imagery is a difficult task because of its low signal-to-noise ratio. This paper introduces an efficient algorithm that generates a sparse 3D point cloud from side-scan sonar images.
arXiv Detail & Related papers (2022-10-19T02:58:08Z)
DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z)
A Low Memory Footprint Quantized Neural Network for Depth Completion of Very Sparse Time-of-Flight Depth Maps [14.885472968649937]
We simulate ToF datasets for indoor 3D perception with challenging sparsity levels. Our model achieves optimal depth map quality by means of input pre-processing and carefully tuned training. We also achieve low memory footprint for weights and activations by means of mixed precision quantization-at-training techniques.
arXiv Detail & Related papers (2022-05-25T17:11:31Z)
Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement [47.61748619439693]
A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. Previous works denoise a point cloud textita posteriori after projecting the imperfect depth data onto 3D space. We enhance depth measurements directly on the sensed images textita priori, before synthesizing a 3D point cloud.
arXiv Detail & Related papers (2021-11-09T04:17:35Z)
Single image deep defocus estimation and its applications [82.93345261434943]
We train a deep neural network to classify image patches into one of the 20 levels of blurriness. The trained model is used to determine the patch blurriness which is then refined by applying an iterative weighted guided filter. The result is a defocus map that carries the information of the degree of blurriness for each pixel.
arXiv Detail & Related papers (2021-07-30T06:18:16Z)
DeepI2P: Image-to-Point Cloud Registration via Deep Classification [71.3121124994105]
DeepI2P is a novel approach for cross-modality registration between an image and a point cloud. Our method estimates the relative rigid transformation between the coordinate frames of the camera and Lidar. We circumvent the difficulty by converting the registration problem into a classification and inverse camera projection optimization problem.
arXiv Detail & Related papers (2021-04-08T04:27:32Z)
Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z)
ODE-CNN: Omnidirectional Depth Extension Networks [43.40308168978984]
We propose a low-cost 3D sensing system that combines an omnidirectional camera with a calibrated projective depth camera. To accurately recover the missing depths, we design an omnidirectional depth extension convolutional neural network. ODE-CNN significantly outperforms (relatively 33% reduction in-depth error) other state-of-the-art (SoTA) methods.
arXiv Detail & Related papers (2020-07-03T03:14:09Z)
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points. Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.