Related papers: RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

URL: http://arxiv.org/abs/2306.03584v2
Date: Fri, 12 Apr 2024 00:52:35 GMT
Title: RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
Authors: Haowen Wang, Zhengping Che, Yufan Yang, Mingyuan Wang, Zhiyuan Xu, Xiuquan Qiao, Mengshi Qi, Feifei Feng, Jian Tang,
Abstract summary: We propose a novel two-branch end-to-end fusion network named RDFC-GAN. It takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps.
Score: 28.634851863097953
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of incomplete depth maps imposes significant challenges for subsequent vision applications, prompting the development of numerous depth completion techniques to mitigate this problem. Numerous methods excel at reconstructing dense depth maps from sparse samples, but they often falter when faced with extensive contiguous regions of missing depth values, a prevalent and critical challenge in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps while ensuring high fidelity through cycle consistency. We fuse the two branches via adaptive fusion modules named W-AdaIN and train the model with the help of pseudo depth maps. Comprehensive evaluations on NYU-Depth V2 and SUN RGB-D datasets show that our method significantly enhances depth completion performance particularly in realistic indoor settings.

Related papers

Geometry-Aware Sparse Depth Sampling for High-Fidelity RGB-D Depth Completion in Robotic Systems [0.20999222360659608]
RGB-D and stereo vision sensors are widely used for industrial robotic systems that perform manipulation, inspection, and navigation tasks.<n>Current depth completion methods produce noisy, incomplete, or biased depth maps due to sensor limitations and environmental conditions.<n>We propose a normal-guided sparse depth sampling strategy that leverages PCA-based surface normal estimation on the RGB-D point cloud to compute a per-pixel depth reliability measure.<n> Experiments show that our geometry-aware sparse depth improves accuracy, reduces artifacts near edges and discontinuities, and produces more realistic training conditions.
arXiv Detail & Related papers (2025-12-09T04:14:05Z)
IGAF: Incremental Guided Attention Fusion for Depth Super-Resolution [13.04760414998408]
We propose a novel sensor fusion methodology for guided depth super-resolution (GDSR) GDSR combines LR depth maps with HR images to estimate detailed HR depth maps. Our model achieves state-of-the-art results compared to all baseline models on the NYU v2 dataset.
arXiv Detail & Related papers (2025-01-03T09:27:51Z)
TDCNet: Transparent Objects Depth Completion with CNN-Transformer Dual-Branch Parallel Network [8.487135422430972]
We propose TDCNet, a novel dual-branch CNN-Transformer parallel network for transparent object depth completion. Our model achieves state-of-the-art performance across multiple public datasets.
arXiv Detail & Related papers (2024-12-19T15:42:21Z)
SteeredMarigold: Steering Diffusion Towards Depth Completion of Largely Incomplete Depth Maps [3.399289369740637]
SteeredMarigold is a training-free, zero-shot depth completion method. It produces metric dense depth even for largely incomplete depth maps. Our code will be publicly available.
arXiv Detail & Related papers (2024-09-16T11:52:13Z)
AGG-Net: Attention Guided Gated-convolutional Network for Depth Image Completion [1.8820731605557168]
We propose a new model for depth image completion based on the Attention Guided Gated-convolutional Network (AGG-Net) In the encoding stage, an Attention Guided Gated-Convolution (AG-GConv) module is proposed to realize the fusion of depth and color features at different scales. In the decoding stage, an Attention Guided Skip Connection (AG-SC) module is presented to avoid introducing too many depth-irrelevant features to the reconstruction.
arXiv Detail & Related papers (2023-09-04T14:16:08Z)
Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR. Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z)
RGB-Depth Fusion GAN for Indoor Depth Completion [29.938869342958125]
In this paper, we design a novel two-branch end-to-end fusion network, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. In one branch, we propose an RGB-depth fusion GAN to transfer the RGB image to the fine-grained textured depth map. In the other branch, we adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches.
arXiv Detail & Related papers (2022-03-21T10:26:38Z)
Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD) Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
Consistent Depth Prediction under Various Illuminations using Dilated Cross Attention [1.332560004325655]
We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps. We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations. Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in experiments.
arXiv Detail & Related papers (2021-12-15T10:02:46Z)
BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels. One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task. The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z)
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline [48.69396457721544]
We build a large-scale dataset named "RGB-D-D" to promote the study of depth map super-resolution (SR) We provide a fast depth map super-resolution (FDSR) baseline, in which the high-frequency component adaptively decomposed from RGB image to guide the depth map SR. For the real-world LR depth maps, our algorithm can produce more accurate HR depth maps with clearer boundaries and to some extent correct the depth value errors.
arXiv Detail & Related papers (2021-04-13T13:27:26Z)
Sparse Auxiliary Networks for Unified Monocular Depth Prediction and Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars. In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors. We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z)
Efficient Depth Completion Using Learned Bases [94.0808155168311]
We propose a new global geometry constraint for depth completion. By assuming depth maps often lay on low dimensional subspaces, a dense depth map can be approximated by a weighted sum of full-resolution principal depth bases.
arXiv Detail & Related papers (2020-12-02T11:57:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.