BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
- URL: http://arxiv.org/abs/2407.17952v1
- Date: Thu, 25 Jul 2024 11:16:37 GMT
- Title: BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation
- Authors: Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, Christopher Schroers,
- Abstract summary: BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning.
By efficient training on small-scale synthetic datasets, BetterDepth achieves state-of-the-art zero-shot MDE performance.
BetterDepth can improve the performance of other MDE models in a plug-and-play manner without additional re-training.
- Score: 25.047835960649167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods show robust performance in the wild but often suffer from insufficiently precise details. Although recent diffusion-based MDE approaches exhibit appealing detail extraction ability, they still struggle in geometrically challenging scenes due to the difficulty of gaining robust geometric priors from diverse datasets. To leverage the complementary merits of both worlds, we propose BetterDepth to efficiently achieve geometrically correct affine-invariant MDE performance while capturing fine-grained details. Specifically, BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning, in which the global depth context is well-captured, and iteratively refines details based on the input image. For the training of such a refiner, we propose global pre-alignment and local patch masking methods to ensure the faithfulness of BetterDepth to depth conditioning while learning to capture fine-grained scene details. By efficient training on small-scale synthetic datasets, BetterDepth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and in-the-wild scenes. Moreover, BetterDepth can improve the performance of other MDE models in a plug-and-play manner without additional re-training.
Related papers
- Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation [0.0]
In this paper, we propose a learnable module, Adaptive Discrete Disparity Volume (ADDV)
ADDV is capable of dynamically sensing depth distributions in different RGB images and generating adaptive bins for them.
We also introduce novel training strategies - uniformizing and sharpening - to provide regularizations under self-supervised conditions.
arXiv Detail & Related papers (2024-04-04T04:22:25Z) - UniDepth: Universal Monocular Metric Depth Estimation [81.80512457953903]
We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains.
Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations.
Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth.
arXiv Detail & Related papers (2024-03-27T18:06:31Z) - PCDepth: Pattern-based Complementary Learning for Monocular Depth
Estimation by Best of Both Worlds [15.823230141827358]
Event cameras record scene dynamics with high temporal resolution, providing rich scene details for monocular depth estimation.
Existing complementary learning approaches for MDE fuse intensity information from images and scene details from event data for better scene understanding.
We propose a Pattern-based Complementary learning architecture for monocular Depth estimation (PCDepth)
arXiv Detail & Related papers (2024-02-29T07:31:59Z) - M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy [26.227927019615446]
Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs.
dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset.
We present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy.
arXiv Detail & Related papers (2023-12-26T07:45:32Z) - Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model [34.85279074665031]
Methods for monocular depth estimation have made significant strides on standard benchmarks, but zero-shot metric depth estimation remains unsolved.
Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes.
We advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization.
arXiv Detail & Related papers (2023-12-20T18:27:47Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - Fully Self-Supervised Depth Estimation from Defocus Clue [79.63579768496159]
We propose a self-supervised framework that estimates depth purely from a sparse focal stack.
We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions.
arXiv Detail & Related papers (2023-03-19T19:59:48Z) - Dense Depth Distillation with Out-of-Distribution Simulated Images [30.79756881887895]
We study data-free knowledge distillation (KD) for monocular depth estimation (MDE)
KD learns a lightweight model for real-world depth perception tasks by compressing it from a trained teacher model while lacking training data in the target domain.
We show that our method outperforms the baseline KD by a good margin and even slightly better performance with as few as 1/6 of training images.
arXiv Detail & Related papers (2022-08-26T07:10:01Z) - An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements.
We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z) - Learnable Bernoulli Dropout for Bayesian Deep Learning [53.79615543862426]
Learnable Bernoulli dropout (LBD) is a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
LBD leads to improved accuracy and uncertainty estimates in image classification and semantic segmentation.
arXiv Detail & Related papers (2020-02-12T18:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.