SparseFormer: Attention-based Depth Completion Network
- URL: http://arxiv.org/abs/2206.04557v1
- Date: Thu, 9 Jun 2022 15:08:24 GMT
- Title: SparseFormer: Attention-based Depth Completion Network
- Authors: Frederik Warburg and Michael Ramamonjisoa and Manuel L\'opez-Antequera
- Abstract summary: We introduce a transformer block, SparseFormer, that fuses 3D landmarks with deep visual features to produce dense depth.
The SparseFormer has a global receptive field, making the module especially effective for depth completion with low-density and non-uniform landmarks.
To address the issue of depth outliers among the 3D landmarks, we introduce a trainable refinement module that filters outliers through attention between the sparse landmarks.
- Score: 2.9434930072968584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most pipelines for Augmented and Virtual Reality estimate the ego-motion of
the camera by creating a map of sparse 3D landmarks. In this paper, we tackle
the problem of depth completion, that is, densifying this sparse 3D map using
RGB images as guidance. This remains a challenging problem due to the low
density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM
pipelines. We introduce a transformer block, SparseFormer, that fuses 3D
landmarks with deep visual features to produce dense depth. The SparseFormer
has a global receptive field, making the module especially effective for depth
completion with low-density and non-uniform landmarks. To address the issue of
depth outliers among the 3D landmarks, we introduce a trainable refinement
module that filters outliers through attention between the sparse landmarks.
Related papers
- Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering [4.717325308876748]
We present a novel approach to generate view consistent and detailed depth maps from a number of posed images.
We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps.
Our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches.
arXiv Detail & Related papers (2024-10-04T18:50:28Z) - A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion [10.519644854849098]
We propose a two-step Transformer-based network for indoor depth completion.
Our proposed network achieves the state-of-the-art performance on the Matterport3D dataset.
In addition, to validate the importance of the depth completion task, we apply our methods to indoor 3D reconstruction.
arXiv Detail & Related papers (2024-06-14T07:42:27Z) - MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis [93.46963803030935]
We present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations.
We propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches.
We also collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro.
arXiv Detail & Related papers (2023-03-28T17:58:05Z) - Sparse SPN: Depth Completion from Sparse Keypoints [17.26885039864854]
Long term goal is to use image-based depth completion to create 3D models from sparse point clouds.
We extend CSPN with multiscale prediction and a dilated kernel, leading to better completion of keypoint-sampled depth.
We also show that a model trained on NYUv2 creates surprisingly good point clouds on ETH3D by completing sparse SfM points.
arXiv Detail & Related papers (2022-12-02T05:45:04Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection [61.89277940084792]
We introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.
We formulate 3D object candidates as learnable queries and propose a depth-guided decoder to conduct object-scene depth interactions.
On KITTI benchmark with monocular images as input, MonoDETR achieves state-of-the-art performance and requires no extra dense depth annotations.
arXiv Detail & Related papers (2022-03-24T19:28:54Z) - DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes [68.38952377590499]
We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments.
Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
arXiv Detail & Related papers (2021-08-12T09:12:39Z) - Learning Joint 2D-3D Representations for Depth Completion [90.62843376586216]
We design a simple yet effective neural network block that learns to extract joint 2D and 3D features.
Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points.
arXiv Detail & Related papers (2020-12-22T22:58:29Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.