SparseFormer: Attention-based Depth Completion Network
- URL: http://arxiv.org/abs/2206.04557v1
- Date: Thu, 9 Jun 2022 15:08:24 GMT
- Title: SparseFormer: Attention-based Depth Completion Network
- Authors: Frederik Warburg and Michael Ramamonjisoa and Manuel L\'opez-Antequera
- Abstract summary: We introduce a transformer block, SparseFormer, that fuses 3D landmarks with deep visual features to produce dense depth.
The SparseFormer has a global receptive field, making the module especially effective for depth completion with low-density and non-uniform landmarks.
To address the issue of depth outliers among the 3D landmarks, we introduce a trainable refinement module that filters outliers through attention between the sparse landmarks.
- Score: 2.9434930072968584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most pipelines for Augmented and Virtual Reality estimate the ego-motion of
the camera by creating a map of sparse 3D landmarks. In this paper, we tackle
the problem of depth completion, that is, densifying this sparse 3D map using
RGB images as guidance. This remains a challenging problem due to the low
density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM
pipelines. We introduce a transformer block, SparseFormer, that fuses 3D
landmarks with deep visual features to produce dense depth. The SparseFormer
has a global receptive field, making the module especially effective for depth
completion with low-density and non-uniform landmarks. To address the issue of
depth outliers among the 3D landmarks, we introduce a trainable refinement
module that filters outliers through attention between the sparse landmarks.
Related papers
- GAC-Net_Geometric and attention-based Network for Depth Completion [10.64600095082433]
This paper proposes a depth completion network combining channel attention mechanism and 3D global feature perception (CGA-Net)
Experiments on the KITTI depth completion dataset show that CGA-Net can significantly improve the prediction accuracy of dense depth maps.
arXiv Detail & Related papers (2025-01-14T10:24:20Z) - Revisiting Monocular 3D Object Detection from Scene-Level Depth Retargeting to Instance-Level Spatial Refinement [44.4805861813093]
Monocular 3D object detection is challenging due to the lack of accurate depth.
Existing depth-assisted solutions still exhibit inferior performance.
We propose a novel depth-adapted monocular 3D object detection network.
arXiv Detail & Related papers (2024-12-26T10:51:50Z) - DepthLab: From Partial to Complete [80.58276388743306]
Missing values remain a common challenge for depth data across its wide range of applications.
This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors.
Our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion.
arXiv Detail & Related papers (2024-12-24T04:16:38Z) - A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion [10.519644854849098]
We propose a two-step Transformer-based network for indoor depth completion.
Our proposed network achieves the state-of-the-art performance on the Matterport3D dataset.
In addition, to validate the importance of the depth completion task, we apply our methods to indoor 3D reconstruction.
arXiv Detail & Related papers (2024-06-14T07:42:27Z) - SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis [93.46963803030935]
We present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations.
We propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches.
We also collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro.
arXiv Detail & Related papers (2023-03-28T17:58:05Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection [57.969536140562674]
We introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.
We formulate 3D object candidates as learnable queries and propose a depth-guided decoder to conduct object-scene depth interactions.
On KITTI benchmark with monocular images as input, MonoDETR achieves state-of-the-art performance and requires no extra dense depth annotations.
arXiv Detail & Related papers (2022-03-24T19:28:54Z) - DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes [68.38952377590499]
We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments.
Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
arXiv Detail & Related papers (2021-08-12T09:12:39Z) - Learning Joint 2D-3D Representations for Depth Completion [90.62843376586216]
We design a simple yet effective neural network block that learns to extract joint 2D and 3D features.
Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points.
arXiv Detail & Related papers (2020-12-22T22:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.