FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
- URL: http://arxiv.org/abs/2412.00671v2
- Date: Thu, 13 Mar 2025 20:20:58 GMT
- Title: FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
- Authors: Yunpeng Bai, Qixing Huang,
- Abstract summary: We propose an efficient Monocular Depth Estimation (MDE) approach named FiffDepth.<n>FiffDepth transforms diffusion-based image generators into a feed-forward architecture for detailed depth estimation.<n>We demonstrate that FiffDepth achieves exceptional accuracy, stability, and fine-grained detail, offering significant improvements in MDE performance.
- Score: 31.06080108012735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular Depth Estimation (MDE) is a fundamental 3D vision problem with numerous applications such as 3D scene reconstruction, autonomous navigation, and AI content creation. However, robust and generalizable MDE remains challenging due to limited real-world labeled data and distribution gaps between synthetic datasets and real data. Existing methods often struggle with real-world test data with low efficiency, reduced accuracy, and lack of detail. To address these issues, we propose an efficient MDE approach named FiffDepth. The key feature of FiffDepth is its use of diffusion priors. It transforms diffusion-based image generators into a feed-forward architecture for detailed depth estimation. FiffDepth preserves key generative features and integrates the strong generalization capabilities of models like DINOv2. Through benchmark evaluations, we demonstrate that FiffDepth achieves exceptional accuracy, stability, and fine-grained detail, offering significant improvements in MDE performance against state-of-the-art MDE approaches. The paper's source code is available here: https://yunpeng1998.github.io/FiffDepth/
Related papers
- UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler [62.06785782635153]
We propose a new model, UniDepthV2, capable of reconstructing metric 3D scenes from solely single images across domains.
UniDepthV2 directly predicts metric 3D points from the input image at inference time without any additional information.
Our model exploits a pseudo-spherical output representation, which disentangles the camera and depth representations.
arXiv Detail & Related papers (2025-02-27T14:03:15Z) - FOF-X: Towards Real-time Detailed Human Reconstruction from a Single Image [68.84221452621674]
We introduce FOF-X for real-time reconstruction of detailed human geometry from a single image.
FOF-X avoids the performance degradation caused by texture and lighting.
We enhance the inter-conversion algorithms between FOF and mesh representations with a Laplacian constraint and an automaton-based discontinuity matcher.
arXiv Detail & Related papers (2024-12-08T14:46:29Z) - High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation [25.047835960649167]
BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning.
BetterDepth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and on in-the-wild scenes.
arXiv Detail & Related papers (2024-07-25T11:16:37Z) - ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching [10.362259643427526]
Current approaches often adapt pre-trained 2D diffusion models for 3D synthesis.
Over-smoothing poses a significant limitation on the high-fidelity generation of 3D models.
LucidDreamer replaces the Denoising Diffusion Probabilistic Model (DDPM) in SDS with the Denoising Diffusion Implicit Model (DDIM)
arXiv Detail & Related papers (2024-05-24T20:19:45Z) - Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z) - UniDepth: Universal Monocular Metric Depth Estimation [81.80512457953903]
We propose a new model, UniDepth, capable of reconstructing metric 3D scenes from solely single images across domains.
Our model exploits a pseudo-spherical output representation, which disentangles camera and depth representations.
Thorough evaluations on ten datasets in a zero-shot regime consistently demonstrate the superior performance of UniDepth.
arXiv Detail & Related papers (2024-03-27T18:06:31Z) - M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy [26.227927019615446]
Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs.
dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset.
We present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy.
arXiv Detail & Related papers (2023-12-26T07:45:32Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.