Related papers: DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

URL: http://arxiv.org/abs/2504.01596v1
Date: Wed, 02 Apr 2025 11:02:21 GMT
Title: DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image
Authors: Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, Xin Yang,
Abstract summary: We propose a novel completion-based method, named DEPTHOR, for depth enhancement in computer vision.<n>First, we simulate real-world dToF data from the accurate ground truth in synthetic datasets to enable noise-robust training.<n>Second, we design a novel network that incorporates monocular depth estimation (MDE), leveraging global depth relationships and contextual information to improve prediction in challenging regions.
Score: 8.588871458005114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth enhancement, which uses RGB images as guidance to convert raw signals from dToF into high-precision, dense depth maps, is a critical task in computer vision. Although existing super-resolution-based methods show promising results on public datasets, they often rely on idealized assumptions like accurate region correspondences and reliable dToF inputs, overlooking calibration errors that cause misalignment and anomaly signals inherent to dToF imaging, limiting real-world applicability. To address these challenges, we propose a novel completion-based method, named DEPTHOR, featuring advances in both the training strategy and model architecture. First, we propose a method to simulate real-world dToF data from the accurate ground truth in synthetic datasets to enable noise-robust training. Second, we design a novel network that incorporates monocular depth estimation (MDE), leveraging global depth relationships and contextual information to improve prediction in challenging regions. On the ZJU-L5 dataset, our training strategy significantly enhances depth completion models, achieving results comparable to depth super-resolution methods, while our model achieves state-of-the-art results, improving Rel and RMSE by 27% and 18%, respectively. On a more challenging set of dToF samples we collected, our method outperforms SOTA methods on preliminary stereo-based GT, improving Rel and RMSE by 23% and 22%, respectively. Our Code is available at https://github.com/ShadowBbBb/Depthor

Related papers

Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion [33.854696587141355]
We propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training.<n>Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions.<n>Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods.
arXiv Detail & Related papers (2025-08-07T02:38:24Z)
Distilling Monocular Foundation Model for Fine-grained Depth Completion [17.603217168518356]
We propose a two-stage knowledge distillation framework to provide dense supervision for depth completion.<n>In the first stage, we generate diverse training data from natural images, which distills geometric knowledge to depth completion.<n>In the second stage, we employ a scale- and shift-invariant loss to learn real-world scales when fine-tuning on real-world datasets.
arXiv Detail & Related papers (2025-03-21T09:34:01Z)
TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image [9.242427101416226]
We propose a single-view RGB-D-based depth completion framework, TransDiff, to achieve material-agnostic object grasping in desktop.<n>We leverage features extracted from RGB images, including semantic segmentation, edge maps, and normal maps, to condition the depth map generation process.<n>Our method learns an iterative denoising process that transforms a random depth distribution into a depth map, guided by initially refined depth information.
arXiv Detail & Related papers (2025-03-17T03:29:37Z)
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation [9.639797094021988]
MetricGold is a novel approach that harnesses generative diffusion model's rich priors to improve metric depth estimation.<n>Our experiments demonstrate robust generalization across diverse datasets, producing sharper and higher quality metric depth estimates.
arXiv Detail & Related papers (2024-11-16T20:59:01Z)
Robust Depth Enhancement via Polarization Prompt Fusion Tuning [112.88371907047396]
We present a framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors. Our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors. To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets.
arXiv Detail & Related papers (2024-04-05T17:55:33Z)
Confidence-Aware RGB-D Face Recognition via Virtual Depth Synthesis [48.59382455101753]
2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose. Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information. In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training. Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining.
arXiv Detail & Related papers (2024-03-11T09:12:24Z)
MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results [76.77266693620425]
Deep learning has enabled more accurate and efficient completion of depth maps from RGB images and sparse ToF measurements. To evaluate the performance of different depth completion methods, we organized an RGB+sparse ToF depth completion competition. In this report, we present the results of the competition and analyze the strengths and weaknesses of the top-performing methods.
arXiv Detail & Related papers (2023-04-27T02:00:04Z)
DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement. The architecture incorporates LSTM units to propagate information through each refinement step. DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z)
Pyramidal Attention for Saliency Detection [30.554118525502115]
This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features. We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations. We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets.
arXiv Detail & Related papers (2022-04-14T06:57:46Z)
Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein GANs [1.0499611180329802]
Real-time estimation of actual environment depth is an essential module for various autonomous system tasks. In this study, latest advancements in the field of generative neural networks are leveraged to fully unsupervised single-image depth synthesis.
arXiv Detail & Related papers (2021-03-31T09:43:38Z)
Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses [67.01164492518481]
This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses. We propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input. Our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.
arXiv Detail & Related papers (2021-02-14T06:44:47Z)
Channel Attention based Iterative Residual Learning for Depth Map Super-Resolution [58.626803922196146]
We argue that DSR models trained on synthetic dataset are restrictive and not effective in dealing with real-world DSR tasks. We make two contributions in tackling real-world degradation of different depth sensors. We propose a new framework for real-world DSR, which consists of four modules.
arXiv Detail & Related papers (2020-06-02T09:12:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.