Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior
- URL: http://arxiv.org/abs/2404.07176v1
- Date: Wed, 10 Apr 2024 17:25:42 GMT
- Title: Self-supervised Monocular Depth Estimation on Water Scenes via Specular Reflection Prior
- Authors: Zhengyang Lu, Ying Chen,
- Abstract summary: This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors.
In the first stage, a water segmentation network is performed to separate the reflection components from the entire image.
The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation.
- Score: 3.2120448116996103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation from a single image is an ill-posed problem for computer vision due to insufficient reliable cues as the prior knowledge. Besides the inter-frame supervision, namely stereo and adjacent frames, extensive prior information is available in the same frame. Reflections from specular surfaces, informative intra-frame priors, enable us to reformulate the ill-posed depth estimation task as a multi-view synthesis. This paper proposes the first self-supervision for deep-learning depth estimation on water scenes via intra-frame priors, known as reflection supervision and geometrical constraints. In the first stage, a water segmentation network is performed to separate the reflection components from the entire image. Next, we construct a self-supervised framework to predict the target appearance from reflections, perceived as other perspectives. The photometric re-projection error, incorporating SmoothL1 and a novel photometric adaptive SSIM, is formulated to optimize pose and depth estimation by aligning the transformed virtual depths and source ones. As a supplement, the water surface is determined from real and virtual camera positions, which complement the depth of the water area. Furthermore, to alleviate these laborious ground truth annotations, we introduce a large-scale water reflection scene (WRS) dataset rendered from Unreal Engine 4. Extensive experiments on the WRS dataset prove the feasibility of the proposed method compared to state-of-the-art depth estimation techniques.
Related papers
- FisheyeDepth: A Real Scale Self-Supervised Depth Estimation Model for Fisheye Camera [8.502741852406904]
We present FisheyeDepth, a self-supervised depth estimation model tailored for fisheye cameras.
We incorporate a fisheye camera model into the projection and reprojection stages during training to handle image distortions.
We also incorporate real-scale pose information into the geometric projection between consecutive frames, replacing the poses estimated by the conventional pose network.
arXiv Detail & Related papers (2024-09-23T14:31:42Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Joint Self-supervised Depth and Optical Flow Estimation towards Dynamic
Objects [3.794605440322862]
In this work, we construct a joint inter-frame-supervised depth and optical flow estimation framework.
For motion segmentation, we adaptively segment the preliminary estimated optical flow map with large areas of connectivity.
Our proposed joint depth and optical flow estimation outperforms existing depth estimators on the KITTI Depth dataset.
arXiv Detail & Related papers (2023-09-07T04:00:52Z) - DeepRM: Deep Recurrent Matching for 6D Pose Refinement [77.34726150561087]
DeepRM is a novel recurrent network architecture for 6D pose refinement.
The architecture incorporates LSTM units to propagate information through each refinement step.
DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
arXiv Detail & Related papers (2022-05-28T16:18:08Z) - Next-Best-View Prediction for Active Stereo Cameras and Highly
Reflective Objects [12.21992378133376]
We propose a next-best-view framework to strategically select camera viewpoints for completing depth data on reflective objects.
We employ an RGB-based pose estimator to obtain current pose predictions from the existing data.
Our active depth acquisition method outperforms two strong baselines for both depth completion and object pose estimation performance.
arXiv Detail & Related papers (2022-02-27T01:48:02Z) - Scale-aware direct monocular odometry [4.111899441919165]
We present a framework for direct monocular odometry based on depth prediction from a deep neural network.
Our proposal largely outperforms classic monocular SLAM, being 5 to 9 times more precise, with an accuracy which is closer to that of stereo systems.
arXiv Detail & Related papers (2021-09-21T10:30:15Z) - Progressive Depth Learning for Single Image Dehazing [56.71963910162241]
Existing dehazing methods often ignore the depth cues and fail in distant areas where heavier haze disturbs the visibility.
We propose a deep end-to-end model that iteratively estimates image depths and transmission maps.
Our approach benefits from explicitly modeling the inner relationship of image depth and transmission map, which is especially effective for distant hazy areas.
arXiv Detail & Related papers (2021-02-21T05:24:18Z) - Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video.
Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details.
In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z) - D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual
Odometry [57.5549733585324]
D3VO is a novel framework for monocular visual odometry that exploits deep networks on three levels -- deep depth, pose and uncertainty estimation.
We first propose a novel self-supervised monocular depth estimation network trained on stereo videos without any external supervision.
We model the photometric uncertainties of pixels on the input images, which improves the depth estimation accuracy.
arXiv Detail & Related papers (2020-03-02T17:47:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.