Related papers: SparseFormer: Attention-based Depth Completion Network

SparseFormer: Attention-based Depth Completion Network

URL: http://arxiv.org/abs/2206.04557v1
Date: Thu, 9 Jun 2022 15:08:24 GMT
Title: SparseFormer: Attention-based Depth Completion Network
Authors: Frederik Warburg and Michael Ramamonjisoa and Manuel L\'opez-Antequera
Abstract summary: We introduce a transformer block, SparseFormer, that fuses 3D landmarks with deep visual features to produce dense depth. The SparseFormer has a global receptive field, making the module especially effective for depth completion with low-density and non-uniform landmarks. To address the issue of depth outliers among the 3D landmarks, we introduce a trainable refinement module that filters outliers through attention between the sparse landmarks.
Score: 2.9434930072968584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most pipelines for Augmented and Virtual Reality estimate the ego-motion of the camera by creating a map of sparse 3D landmarks. In this paper, we tackle the problem of depth completion, that is, densifying this sparse 3D map using RGB images as guidance. This remains a challenging problem due to the low density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM pipelines. We introduce a transformer block, SparseFormer, that fuses 3D landmarks with deep visual features to produce dense depth. The SparseFormer has a global receptive field, making the module especially effective for depth completion with low-density and non-uniform landmarks. To address the issue of depth outliers among the 3D landmarks, we introduce a trainable refinement module that filters outliers through attention between the sparse landmarks.

Related papers

GAC-Net_Geometric and attention-based Network for Depth Completion [10.64600095082433]
This paper proposes a depth completion network combining channel attention mechanism and 3D global feature perception (CGA-Net) Experiments on the KITTI depth completion dataset show that CGA-Net can significantly improve the prediction accuracy of dense depth maps.
arXiv Detail & Related papers (2025-01-14T10:24:20Z)
Revisiting Monocular 3D Object Detection with Depth Thickness Field [44.4805861813093]
We present MonoDTF, a scene-to-instance depth-adapted network for monocular 3D object detection. The framework mainly comprises a Scene-Level Depth Retargeting (SDR) module and an Instance-Level Spatial Refinement (ISR) module. The latter refines the voxel space with the guidance of instances, enhancing the 3D instance-aware capability of the depth thickness field.
arXiv Detail & Related papers (2024-12-26T10:51:50Z)
DepthLab: From Partial to Complete [80.58276388743306]
Missing values remain a common challenge for depth data across its wide range of applications. This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors. Our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion.
arXiv Detail & Related papers (2024-12-24T04:16:38Z)
Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering [4.717325308876748]
We present a novel approach to generate view consistent and detailed depth maps from a number of posed images. We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps. Our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches.
arXiv Detail & Related papers (2024-10-04T18:50:28Z)
A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion [10.519644854849098]
We propose a two-step Transformer-based network for indoor depth completion. Our proposed network achieves the state-of-the-art performance on the Matterport3D dataset. In addition, to validate the importance of the depth completion task, we apply our methods to indoor 3D reconstruction.
arXiv Detail & Related papers (2024-06-14T07:42:27Z)
MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection. We propose to increase the complementarity of depths with two novel designs. Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z)
SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis [93.46963803030935]
We present a new Sparse-view NeRF (SparseNeRF) framework that exploits depth priors from real-world inaccurate observations. We propose a simple yet effective constraint, a local depth ranking method, on NeRFs such that the expected depth ranking of the NeRF is consistent with that of the coarse depth maps in local patches. We also collect a new dataset NVS-RGBD that contains real-world depth maps from Azure Kinect, ZED 2, and iPhone 13 Pro.
arXiv Detail & Related papers (2023-03-28T17:58:05Z)
Sparse SPN: Depth Completion from Sparse Keypoints [17.26885039864854]
Long term goal is to use image-based depth completion to create 3D models from sparse point clouds. We extend CSPN with multiscale prediction and a dilated kernel, leading to better completion of keypoint-sampled depth. We also show that a model trained on NYUv2 creates surprisingly good point clouds on ETH3D by completing sparse SfM points.
arXiv Detail & Related papers (2022-12-02T05:45:04Z)
Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors. Small, distant, and incomplete objects with sparse or few points are often hard to detect. We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z)
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection [61.89277940084792]
We introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR. We formulate 3D object candidates as learnable queries and propose a depth-guided decoder to conduct object-scene depth interactions. On KITTI benchmark with monocular images as input, MonoDETR achieves state-of-the-art performance and requires no extra dense depth annotations.
arXiv Detail & Related papers (2022-03-24T19:28:54Z)
DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes [68.38952377590499]
We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
arXiv Detail & Related papers (2021-08-12T09:12:39Z)
Learning Joint 2D-3D Representations for Depth Completion [90.62843376586216]
We design a simple yet effective neural network block that learns to extract joint 2D and 3D features. Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points.
arXiv Detail & Related papers (2020-12-22T22:58:29Z)
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video. Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.