Related papers: Towards Sharper Object Boundaries in Self-Supervised Depth Estimation

Towards Sharper Object Boundaries in Self-Supervised Depth Estimation

URL: http://arxiv.org/abs/2509.15987v1
Date: Fri, 19 Sep 2025 13:53:51 GMT
Title: Towards Sharper Object Boundaries in Self-Supervised Depth Estimation
Authors: Aurélien Cecille, Stefan Duffner, Franck Davoine, Rémi Agier, Thibault Neveu,
Abstract summary: Our method produces crisp depth discontinuities using only self-supervision.<n>We model per-pixel depth as a mixture distribution, capturing multiple plausible depths.<n>This formulation integrates seamlessly into existing pipelines.
Score: 6.93581193918817
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Accurate monocular depth estimation is crucial for 3D scene understanding, but existing methods often blur depth at object boundaries, introducing spurious intermediate 3D points. While achieving sharp edges usually requires very fine-grained supervision, our method produces crisp depth discontinuities using only self-supervision. Specifically, we model per-pixel depth as a mixture distribution, capturing multiple plausible depths and shifting uncertainty from direct regression to the mixture weights. This formulation integrates seamlessly into existing pipelines via variance-aware loss functions and uncertainty propagation. Extensive evaluations on KITTI and VKITTIv2 show that our method achieves up to 35% higher boundary sharpness and improves point cloud quality compared to state-of-the-art baselines.

Related papers

Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion [57.08169927189237]
Existing methods for depth completion operate in tightly constrained settings.<n>Inspired by advances in monocular depth estimation, we reframe depth completion as an image-conditional depth map generation.<n>Marigold-DC builds on a pretrained latent diffusion model for monocular depth estimation and injects the depth observations as test-time guidance.
arXiv Detail & Related papers (2024-12-18T00:06:41Z)
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation [17.88798247412388]
SharpDepth combines the metric accuracy of discriminative depth estimation methods with the fine-grained boundary sharpness typically achieved by generative methods.<n>Our approach bridges these limitations by integrating metric accuracy with detailed boundary preservation, resulting in depth predictions that are both metrically precise and visually sharp.
arXiv Detail & Related papers (2024-11-27T11:07:27Z)
Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z)
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z)
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation [42.19770683222846]
Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. In this paper we propose to learn to detect the location of depth edges from densely-supervised synthetic data. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
arXiv Detail & Related papers (2022-12-10T14:49:24Z)
Probabilistic Volumetric Fusion for Dense Monocular SLAM [33.156523309257786]
We present a novel method to reconstruct 3D scenes by leveraging deep dense monocular SLAM and fast uncertainty propagation. The proposed approach is able to 3D reconstruct scenes densely, accurately, and in real-time while being robust to extremely noisy depth estimates. We show that our approach achieves 92% better accuracy than directly fusing depths from monocular SLAM, and up to 90% improvements compared to the best competing approach.
arXiv Detail & Related papers (2022-10-03T23:53:35Z)
On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper. We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z)
Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation. The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z)
Robust Depth Completion with Uncertainty-Driven Loss Functions [60.9237639890582]
We introduce uncertainty-driven loss functions to improve the robustness of depth completion and handle the uncertainty in depth completion. Our method has been tested on KITTI Depth Completion Benchmark and achieved the state-of-the-art robustness performance in terms of MAE, IMAE, and IRMSE metrics.
arXiv Detail & Related papers (2021-12-15T05:22:34Z)
Deep Multi-view Depth Estimation with Predicted Uncertainty [11.012201499666503]
We employ a dense-optical-flow network to compute correspondences and then triangulate the point cloud to obtain an initial depth map. To further increase the triangulation accuracy, we introduce a depth-refinement network (DRN) that optimize the initial depth map based on the image's contextual cues.
arXiv Detail & Related papers (2020-11-19T00:22:09Z)
Deep Multi-Scale Feature Learning for Defocus Blur Estimation [10.455763145066168]
This paper presents an edge-based defocus blur estimation method from a single defocused image. We first distinguish edges that lie at depth discontinuities (called depth edges, for which the blur estimate is ambiguous) from edges that lie at approximately constant depth regions (called pattern edges, for which the blur estimate is well-defined). We estimate the defocus blur amount at pattern edges only, and explore an scheme based on guided filters that prevents data propagation across the detected depth edges to obtain a dense blur map with well-defined object boundaries.
arXiv Detail & Related papers (2020-09-24T20:36:40Z)
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video. Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.