Related papers: SemAttNet: Towards Attention-based Semantic Aware Guided Depth Completion

SemAttNet: Towards Attention-based Semantic Aware Guided Depth Completion

URL: http://arxiv.org/abs/2204.13635v1
Date: Thu, 28 Apr 2022 16:53:25 GMT
Title: SemAttNet: Towards Attention-based Semantic Aware Guided Depth Completion
Authors: Danish Nazir, Marcus Liwicki, Didier Stricker, Muhammad Zeshan Afzal
Abstract summary: We propose a novel three-branch backbone comprising color-guided, semantic-guided, and depth-guided branches. The predicted dense depth map of color-guided branch along-with semantic image and sparse depth map is passed as input to semantic-guided branch. The depth-guided branch takes sparse, color, and semantic depths to generate the dense depth map.
Score: 12.724769241831396
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Depth completion involves recovering a dense depth map from a sparse map and an RGB image. Recent approaches focus on utilizing color images as guidance images to recover depth at invalid pixels. However, color images alone are not enough to provide the necessary semantic understanding of the scene. Consequently, the depth completion task suffers from sudden illumination changes in RGB images (e.g., shadows). In this paper, we propose a novel three-branch backbone comprising color-guided, semantic-guided, and depth-guided branches. Specifically, the color-guided branch takes a sparse depth map and RGB image as an input and generates color depth which includes color cues (e.g., object boundaries) of the scene. The predicted dense depth map of color-guided branch along-with semantic image and sparse depth map is passed as input to semantic-guided branch for estimating semantic depth. The depth-guided branch takes sparse, color, and semantic depths to generate the dense depth map. The color depth, semantic depth, and guided depth are adaptively fused to produce the output of our proposed three-branch backbone. In addition, we also propose to apply semantic-aware multi-modal attention-based fusion block (SAMMAFB) to fuse features between all three branches. We further use CSPN++ with Atrous convolutions to refine the dense depth map produced by our three-branch backbone. Extensive experiments show that our model achieves state-of-the-art performance in the KITTI depth completion benchmark at the time of submission.

Related papers

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation [66.7989548848166]
Existing approaches encode depth maps along with RGB images and perform feature fusion between them to enable more robust predictions. We present DFormerv2, a strong RGBD encoder that explicitly uses depth maps as geometry priors rather than encoding depth information with neural networks. Our goal is to extract the geometry clues from the depth and spatial distances among all the image patch tokens, which will then be used as geometry priors to allocate attention weights in self-attention.
arXiv Detail & Related papers (2025-04-07T03:06:07Z)
DepthLab: From Partial to Complete [80.58276388743306]
Missing values remain a common challenge for depth data across its wide range of applications. This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors. Our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion.
arXiv Detail & Related papers (2024-12-24T04:16:38Z)
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion [51.69876947593144]
Existing methods for depth completion operate in tightly constrained settings. Inspired by advances in monocular depth estimation, we reframe depth completion as an image-conditional depth map generation. Marigold-DC builds on a pretrained latent diffusion model for monocular depth estimation and injects the depth observations as test-time guidance.
arXiv Detail & Related papers (2024-12-18T00:06:41Z)
Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge. Our method extracts low-level features from edges and textures to create a texture image. By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z)
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion [31.70022495622075]
We explore a repetitive design in our image guided network to gradually and sufficiently recover depth values. In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments. In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity. In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint.
arXiv Detail & Related papers (2023-09-01T09:11:20Z)
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion [28.634851863097953]
We propose a novel two-branch end-to-end fusion network named RDFC-GAN. It takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps.
arXiv Detail & Related papers (2023-06-06T11:03:05Z)
RGB-Depth Fusion GAN for Indoor Depth Completion [29.938869342958125]
In this paper, we design a novel two-branch end-to-end fusion network, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. In one branch, we propose an RGB-depth fusion GAN to transfer the RGB image to the fine-grained textured depth map. In the other branch, we adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches.
arXiv Detail & Related papers (2022-03-21T10:26:38Z)
Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD) Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes [68.38952377590499]
We present a novel approach for estimating depth from a monocular camera as it moves through complex indoor environments. Our approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people.
arXiv Detail & Related papers (2021-08-12T09:12:39Z)
Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars. In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors. We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z)
PENet: Towards Precise and Efficient Image Guided Depth Completion [11.162415111320625]
How to fuse the color and depth modalities plays an important role in achieving good performance. This paper proposes a two-branch backbone that consists of a color-dominant branch and a depth-dominant branch. The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission.
arXiv Detail & Related papers (2021-03-01T06:09:23Z)
Efficient Depth Completion Using Learned Bases [94.0808155168311]
We propose a new global geometry constraint for depth completion. By assuming depth maps often lay on low dimensional subspaces, a dense depth map can be approximated by a weighted sum of full-resolution principal depth bases.
arXiv Detail & Related papers (2020-12-02T11:57:37Z)
Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images. This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting. We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.