Related papers: A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation

A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation

URL: http://arxiv.org/abs/2412.14103v2
Date: Fri, 07 Mar 2025 11:02:33 GMT
Title: A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation
Authors: Rémi Marsal, Alexandre Chapoutot, Philippe Xu, David Filliat,
Abstract summary: We propose a new method to rescale Depth Anything predictions using 3D points provided by sensors or techniques such as low-resolution LiDAR or structure-from-motion with poses given by an IMU.<n>Our experiments highlight enhancements relative to zero-shot monocular metric depth estimation methods, competitive results compared to fine-tuned approaches and a better robustness than depth completion approaches.
Score: 46.037640130193566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent development of foundation models for monocular depth estimation such as Depth Anything paved the way to zero-shot monocular depth estimation. Since it returns an affine-invariant disparity map, the favored technique to recover the metric depth consists in fine-tuning the model. However, this stage is not straightforward, it can be costly and time-consuming because of the training and the creation of the dataset. The latter must contain images captured by the camera that will be used at test time and the corresponding ground truth. Moreover, the fine-tuning may also degrade the generalizing capacity of the original model. Instead, we propose in this paper a new method to rescale Depth Anything predictions using 3D points provided by sensors or techniques such as low-resolution LiDAR or structure-from-motion with poses given by an IMU. This approach avoids fine-tuning and preserves the generalizing power of the original depth estimation model while being robust to the noise of the sparse depth or of the depth model. Our experiments highlight enhancements relative to zero-shot monocular metric depth estimation methods, competitive results compared to fine-tuned approaches and a better robustness than depth completion approaches. Code available at https://gitlab.ensta.fr/ssh/monocular-depth-rescaling.

Related papers

Revisiting Gradient-based Uncertainty for Monocular Depth Estimation [10.502852645001882]
We introduce gradient-based uncertainty estimation for monocular depth estimation models. We demonstrate that our approach is effective in determining the uncertainty without re-training. In particular, for models trained with monocular sequences and therefore most prone to uncertainty, our method outperforms related approaches.
arXiv Detail & Related papers (2025-02-09T17:21:41Z)
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion [51.69876947593144]
Existing methods for depth completion operate in tightly constrained settings. Inspired by advances in monocular depth estimation, we reframe depth completion as an image-conditional depth map generation. Marigold-DC builds on a pretrained latent diffusion model for monocular depth estimation and injects the depth observations as test-time guidance.
arXiv Detail & Related papers (2024-12-18T00:06:41Z)
Metrically Scaled Monocular Depth Estimation through Sparse Priors for Underwater Robots [0.0]
We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions. The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea. The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core.
arXiv Detail & Related papers (2023-10-25T16:32:31Z)
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction. Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z)
Self-Supervised Learning based Depth Estimation from Monocular Images [0.0]
The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input. We plan to do intrinsic camera parameters during training and apply weather augmentations to further generalize our model.
arXiv Detail & Related papers (2023-04-14T07:14:08Z)
Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth. Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z)
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes. We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z)
Improving Depth Estimation using Location Information [0.0]
This paper improves the self-supervised deep learning techniques to perform accurate generalized monocular depth estimation. The main idea is to train the deep model to take into account a sequence of the different frames, each frame is geotagged with its location information.
arXiv Detail & Related papers (2021-12-27T22:30:14Z)
Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image. We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z)
Variational Monocular Depth Estimation for Reliability Prediction [12.951621755732544]
Self-supervised learning for monocular depth estimation is widely investigated as an alternative to supervised learning approach. Previous works have successfully improved the accuracy of depth estimation by modifying the model structure. In this paper, we theoretically formulate a variational model for the monocular depth estimation to predict the reliability of the estimated depth image.
arXiv Detail & Related papers (2020-11-24T06:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.