Related papers: Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining

Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining

URL: http://arxiv.org/abs/2502.14573v1
Date: Thu, 20 Feb 2025 13:59:40 GMT
Title: Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining
Authors: Wonhyeok Choi, Kyumin Hwang, Wei Peng, Minwoo Choi, Sunghoon Im,
Abstract summary: Self-supervised monocular depth estimation (SSMDE) aims to predict the dense depth map of a monocular image.<n>It struggles with reflective surfaces, as they violate the assumptions of Lambertian reflectance.<n>We propose a novel training strategy for an SSMDE by leveraging triplet mining to pinpoint reflective regions at the pixel level.
Score: 14.432210570631577
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Self-supervised monocular depth estimation (SSMDE) aims to predict the dense depth map of a monocular image, by learning depth from RGB image sequences, eliminating the need for ground-truth depth labels. Although this approach simplifies data acquisition compared to supervised methods, it struggles with reflective surfaces, as they violate the assumptions of Lambertian reflectance, leading to inaccurate training on such surfaces. To tackle this problem, we propose a novel training strategy for an SSMDE by leveraging triplet mining to pinpoint reflective regions at the pixel level, guided by the camera geometry between different viewpoints. The proposed reflection-aware triplet mining loss specifically penalizes the inappropriate photometric error minimization on the localized reflective regions while preserving depth accuracy in non-reflective areas. We also incorporate a reflection-aware knowledge distillation method that enables a student model to selectively learn the pixel-level knowledge from reflective and non-reflective regions. This results in robust depth estimation across areas. Evaluation results on multiple datasets demonstrate that our method effectively enhances depth quality on reflective surfaces and outperforms state-of-the-art SSMDE baselines.

Related papers

Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces [10.557788087220509]
Self-supervised monocular depth estimation (SSMDE) has gained attention in the field of deep learning. We propose a novel framework that incorporates intrinsic image decomposition into SSMDE. Our method synergistically trains for both monocular depth estimation and intrinsic image decomposition.
arXiv Detail & Related papers (2025-03-28T07:56:59Z)
Acquisition of Spatially-Varying Reflectance and Surface Normals via Polarized Reflectance Fields [15.653977591138682]
Accurately measuring the geometry and spatially-varying reflectance of real-world objects is a complex task.<n>We propose a novel approach using polarized reflectance field capture and a comprehensive statistical analysis algorithm.<n>We showcase the captured shapes and reflectance of diverse objects with a wide material range, spanning from highly diffuse to highly glossy.
arXiv Detail & Related papers (2024-12-13T00:39:55Z)
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images [62.752710734332894]
NeRSP is a Neural 3D reconstruction technique for Reflective surfaces with Sparse Polarized images. We derive photometric and geometric cues from the polarimetric image formation model and multiview azimuth consistency. We achieve the state-of-the-art surface reconstruction results with only 6 views as input.
arXiv Detail & Related papers (2024-06-11T09:53:18Z)
Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched Data [73.9872931307401]
We propose a novel weakly-supervised framework to train a monocular depth estimation network. The proposed framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation. Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes.
arXiv Detail & Related papers (2021-09-23T18:04:12Z)
Progressive Depth Learning for Single Image Dehazing [56.71963910162241]
Existing dehazing methods often ignore the depth cues and fail in distant areas where heavier haze disturbs the visibility. We propose a deep end-to-end model that iteratively estimates image depths and transmission maps. Our approach benefits from explicitly modeling the inner relationship of image depth and transmission map, which is especially effective for distant hazy areas.
arXiv Detail & Related papers (2021-02-21T05:24:18Z)
Variational Monocular Depth Estimation for Reliability Prediction [12.951621755732544]
Self-supervised learning for monocular depth estimation is widely investigated as an alternative to supervised learning approach. Previous works have successfully improved the accuracy of depth estimation by modifying the model structure. In this paper, we theoretically formulate a variational model for the monocular depth estimation to predict the reliability of the estimated depth image.
arXiv Detail & Related papers (2020-11-24T06:23:51Z)
Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods. The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps. Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z)
Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object. We first estimate per-view depth maps using a deep multi-view stereo network. These depth maps are used to coarsely align the different views. We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry [57.5549733585324]
D3VO is a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation. We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision. We model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy.
arXiv Detail & Related papers (2020-03-02T17:47:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.