FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
- URL: http://arxiv.org/abs/2412.00671v1
- Date: Sun, 01 Dec 2024 04:59:34 GMT
- Title: FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
- Authors: Yunpeng Bai, Qixing Huang,
- Abstract summary: FiffDepth is a framework that transforms diffusion-based image generators into a feedforward architecture for detailed depth estimation.
It achieves enhanced accuracy, stability, and fine-grained detail, offering a significant improvement in MDE performance across diverse real-world scenarios.
- Score: 31.06080108012735
- License:
- Abstract: Monocular Depth Estimation (MDE) is essential for applications like 3D scene reconstruction, autonomous navigation, and AI content creation. However, robust MDE remains challenging due to noisy real-world data and distribution gaps in synthetic datasets. Existing methods often struggle with low efficiency, reduced accuracy, and lack of detail. To address this, we propose an efficient approach for leveraging diffusion priors and introduce FiffDepth, a framework that transforms diffusion-based image generators into a feedforward architecture for detailed depth estimation. By preserving key generative features and integrating the strong generalization capabilities of models like dinov2, FiffDepth achieves enhanced accuracy, stability, and fine-grained detail, offering a significant improvement in MDE performance across diverse real-world scenarios.
Related papers
- High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation [25.047835960649167]
BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning.
BetterDepth achieves state-of-the-art zero-shot MDE performance on diverse public datasets and on in-the-wild scenes.
arXiv Detail & Related papers (2024-07-25T11:16:37Z) - Digging into contrastive learning for robust depth estimation with diffusion models [55.62276027922499]
We propose a novel robust depth estimation method called D4RD.
It features a custom contrastive learning mode tailored for diffusion models to mitigate performance degradation in complex environments.
In experiments, D4RD surpasses existing state-of-the-art solutions on synthetic corruption datasets and real-world weather conditions.
arXiv Detail & Related papers (2024-04-15T14:29:47Z) - Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery [7.339380415551658]
We introduce Distribution and depth-aware human mesh recovery (D2A-HMR), an end-to-end transformer architecture.
Our approach demonstrates superior performance in handling OOD data in certain scenarios.
arXiv Detail & Related papers (2024-03-14T03:07:58Z) - Bayesian Diffusion Models for 3D Shape Reconstruction [54.69889488052155]
We present a prediction algorithm that performs effective Bayesian inference by tightly coupling the top-down (prior) information with the bottom-up (data-driven) procedure.
We show the effectiveness of BDM on the 3D shape reconstruction task.
arXiv Detail & Related papers (2024-03-11T17:55:53Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.