Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces
- URL: http://arxiv.org/abs/2408.06083v1
- Date: Mon, 12 Aug 2024 11:58:45 GMT
- Title: Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces
- Authors: Junrui Zhang, Jiaqi Li, Yachuan Huang, Yiran Wang, Jinghong Zheng, Liao Shen, Zhiguo Cao,
- Abstract summary: We propose non-Lambertian surface regional guidance for monocular depth estimation.
We employ the random tone-mapping augmentation during training to ensure the network can predict correct results for varying lighting inputs.
Our method achieves accuracy improvements of 33.39% and 5.21% in zero-shot testing on the Booster and Mirror3D dataset.
- Score: 12.241301077789235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of monocular depth estimation (MDE), many models with excellent zero-shot performance in general scenes emerge recently. However, these methods often fail in predicting non-Lambertian surfaces, such as transparent or mirror (ToM) surfaces, due to the unique reflective properties of these regions. Previous methods utilize externally provided ToM masks and aim to obtain correct depth maps through direct in-painting of RGB images. These methods highly depend on the accuracy of additional input masks, and the use of random colors during in-painting makes them insufficiently robust. We are committed to incrementally enabling the baseline model to directly learn the uniqueness of non-Lambertian surface regions for depth estimation through a well-designed training framework. Therefore, we propose non-Lambertian surface regional guidance, which constrains the predictions of MDE model from the gradient domain to enhance its robustness. Noting the significant impact of lighting on this task, we employ the random tone-mapping augmentation during training to ensure the network can predict correct results for varying lighting inputs. Additionally, we propose an optional novel lighting fusion module, which uses Variational Autoencoders to fuse multiple images and obtain the most advantageous input RGB image for depth estimation when multi-exposure images are available. Our method achieves accuracy improvements of 33.39% and 5.21% in zero-shot testing on the Booster and Mirror3D dataset for non-Lambertian surfaces, respectively, compared to the Depth Anything V2. The state-of-the-art performance of 90.75 in delta1.05 within the ToM regions on the TRICKY2024 competition test set demonstrates the effectiveness of our approach.
Related papers
- Self-supervised Monocular Depth Estimation Robust to Reflective Surface Leveraged by Triplet Mining [14.432210570631577]
Self-supervised monocular depth estimation (SSMDE) aims to predict the dense depth map of a monocular image.
It struggles with reflective surfaces, as they violate the assumptions of Lambertian reflectance.
We propose a novel training strategy for an SSMDE by leveraging triplet mining to pinpoint reflective regions at the pixel level.
arXiv Detail & Related papers (2025-02-20T13:59:40Z) - Robust Depth Enhancement via Polarization Prompt Fusion Tuning [112.88371907047396]
We present a framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors.
Our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors.
To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets.
arXiv Detail & Related papers (2024-04-05T17:55:33Z) - Q-SLAM: Quadric Representations for Monocular SLAM [85.82697759049388]
We reimagine volumetric representations through the lens of quadrics.
We use quadric assumption to rectify noisy depth estimations from RGB inputs.
We introduce a novel quadric-decomposed transformer to aggregate information across quadrics.
arXiv Detail & Related papers (2024-03-12T23:27:30Z) - Probabilistic Volumetric Fusion for Dense Monocular SLAM [33.156523309257786]
We present a novel method to reconstruct 3D scenes by leveraging deep dense monocular SLAM and fast uncertainty propagation.
The proposed approach is able to 3D reconstruct scenes densely, accurately, and in real-time while being robust to extremely noisy depth estimates.
We show that our approach achieves 92% better accuracy than directly fusing depths from monocular SLAM, and up to 90% improvements compared to the best competing approach.
arXiv Detail & Related papers (2022-10-03T23:53:35Z) - Uncertainty-Aware Deep Multi-View Photometric Stereo [100.97116470055273]
Photometric stereo (PS) is excellent at recovering high-frequency surface details, whereas multi-view stereo (MVS) can help remove the low-frequency distortion due to PS and retain the global shape.
This paper proposes an approach that can effectively utilize such complementary strengths of PS and MVS.
We estimate per-pixel surface normals and depth using an uncertainty-aware deep-PS network and deep-MVS network, respectively.
arXiv Detail & Related papers (2022-02-26T05:45:52Z) - Consistent Depth Prediction under Various Illuminations using Dilated
Cross Attention [1.332560004325655]
We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps.
We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations.
Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in experiments.
arXiv Detail & Related papers (2021-12-15T10:02:46Z) - Wild ToFu: Improving Range and Quality of Indirect Time-of-Flight Depth
with RGB Fusion in Challenging Environments [56.306567220448684]
We propose a new learning based end-to-end depth prediction network which takes noisy raw I-ToF signals as well as an RGB image.
We show more than 40% RMSE improvement on the final depth map compared to the baseline approach.
arXiv Detail & Related papers (2021-12-07T15:04:14Z) - Facial Depth and Normal Estimation using Single Dual-Pixel Camera [81.02680586859105]
We introduce a DP-oriented Depth/Normal network that reconstructs the 3D facial geometry.
It contains the corresponding ground-truth 3D models including depth map and surface normal in metric scale.
It achieves state-of-the-art performances over recent DP-based depth/normal estimation methods.
arXiv Detail & Related papers (2021-11-25T05:59:27Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Differentiable Diffusion for Dense Depth Estimation from Multi-view
Images [31.941861222005603]
We present a method to estimate dense depth by optimizing a sparse set of points such that their diffusion into a depth map minimizes a multi-view reprojection error from RGB supervision.
We also develop an efficient optimization routine that can simultaneously optimize the 50k+ points required for complex scene reconstruction.
arXiv Detail & Related papers (2021-06-16T16:17:34Z) - Learning Inter- and Intra-frame Representations for Non-Lambertian
Photometric Stereo [14.5172791293107]
We build a two-stage Convolutional Neural Network (CNN) architecture to construct inter- and intra-frame representations.
We experimentally investigate numerous network design alternatives for identifying the optimal scheme to deploy inter-frame and intra-frame feature extraction modules.
arXiv Detail & Related papers (2020-12-26T11:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.