Related papers: Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion

Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion

URL: http://arxiv.org/abs/2508.04984v1
Date: Thu, 07 Aug 2025 02:38:24 GMT
Title: Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion
Authors: Shenglun Chen, Xinzhu Ma, Hong Zhang, Haojie Li, Zhihui Wang,
Abstract summary: We propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training.<n>Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions.<n>Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods.
Score: 33.854696587141355
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth completion is a pivotal challenge in computer vision, aiming at reconstructing the dense depth map from a sparse one, typically with a paired RGB image. Existing learning based models rely on carefully prepared but limited data, leading to significant performance degradation in out-of-distribution (OOD) scenarios. Recent foundation models have demonstrated exceptional robustness in monocular depth estimation through large-scale training, and using such models to enhance the robustness of depth completion models is a promising solution. In this work, we propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training. Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions. We further design a dual-space propagation approach, without any learnable parameters, to effectively propagates sparse depth in both 3D and 2D spaces to maintain geometric structure and local consistency. To refine the intricate structure, we introduce a learnable correction module to progressively adjust the depth prediction towards the real depth. We train our model on the NYUv2 and KITTI datasets as in-distribution datasets and extensively evaluate the framework on 16 other datasets. Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods. Our models are released in https://github.com/shenglunch/PSD.

Related papers

Distilling Monocular Foundation Model for Fine-grained Depth Completion [17.603217168518356]
We propose a two-stage knowledge distillation framework to provide dense supervision for depth completion.<n>In the first stage, we generate diverse training data from natural images, which distills geometric knowledge to depth completion.<n>In the second stage, we employ a scale- and shift-invariant loss to learn real-world scales when fine-tuning on real-world datasets.
arXiv Detail & Related papers (2025-03-21T09:34:01Z)
Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity [20.86484181698326]
Existing models, limited to deterministic predictions, overlook real-world multi-layer depth.<n>We introduce a paradigm shift from single-prediction to multi-hypothesis spatial foundation models.
arXiv Detail & Related papers (2025-03-08T02:33:54Z)
DepthLab: From Partial to Complete [80.58276388743306]
Missing values remain a common challenge for depth data across its wide range of applications.<n>This work bridges this gap with DepthLab, a foundation depth inpainting model powered by image diffusion priors.<n>Our approach proves its worth in various downstream tasks, including 3D scene inpainting, text-to-3D scene generation, sparse-view reconstruction with DUST3R, and LiDAR depth completion.
arXiv Detail & Related papers (2024-12-24T04:16:38Z)
Amodal Depth Anything: Amodal Depth Estimation in the Wild [39.27552294431748]
Amodal depth estimation aims to predict the depth of occluded (invisible) parts of objects in a scene.<n>We propose a novel formulation of amodal depth estimation in the wild, focusing on relative depth prediction to improve model generalization across diverse natural images.<n>We present two complementary frameworks: Amodal-DAV2, a deterministic model based on Depth Anything V2, and Amodal-DepthFM, a generative model that integrates conditional flow matching principles.
arXiv Detail & Related papers (2024-12-03T09:56:38Z)
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. Comprehensive experiments underscore our framework's superior generalization capabilities. Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z)
Joint Prediction of Monocular Depth and Structure using Planar and Parallax Geometry [4.620624344434533]
Supervised learning depth estimation methods can achieve good performance when trained on high-quality ground-truth, like LiDAR data. We propose a novel approach combining structure information from a promising Plane and Parallax geometry pipeline with depth information into a U-Net supervised learning network. Our model has impressive performance on depth prediction of thin objects and edges, and compared to structure prediction baseline, our model performs more robustly.
arXiv Detail & Related papers (2022-07-13T17:04:05Z)
Improving Depth Estimation using Location Information [0.0]
This paper improves the self-supervised deep learning techniques to perform accurate generalized monocular depth estimation. The main idea is to train the deep model to take into account a sequence of the different frames, each frame is geotagged with its location information.
arXiv Detail & Related papers (2021-12-27T22:30:14Z)
Efficient Depth Completion Using Learned Bases [94.0808155168311]
We propose a new global geometry constraint for depth completion. By assuming depth maps often lay on low dimensional subspaces, a dense depth map can be approximated by a weighted sum of full-resolution principal depth bases.
arXiv Detail & Related papers (2020-12-02T11:57:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.