Related papers: Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

URL: http://arxiv.org/abs/2508.02323v1
Date: Mon, 04 Aug 2025 11:43:12 GMT
Title: Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images
Authors: Philipp Wulff, Felix Wimbauer, Dominik Muhle, Daniel Cremers,
Abstract summary: We propose to leverage pre-trained 2D diffusion models and depth prediction models to generate synthetic scene geometry from a single image.<n>Our experiments on the challenging KITTI-360 and datasets demonstrate that our method matches or outperforms state-of-the-art baselines.
Score: 39.08243715525956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Volumetric scene reconstruction from a single image is crucial for a broad range of applications like autonomous driving and robotics. Recent volumetric reconstruction methods achieve impressive results, but generally require expensive 3D ground truth or multi-view supervision. We propose to leverage pre-trained 2D diffusion models and depth prediction models to generate synthetic scene geometry from a single image. This can then be used to distill a feed-forward scene reconstruction model. Our experiments on the challenging KITTI-360 and Waymo datasets demonstrate that our method matches or outperforms state-of-the-art baselines that use multi-view supervision, and offers unique advantages, for example regarding dynamic scenes.

Related papers

DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos [52.46386528202226]
We introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM)<n>It is the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene.<n>It achieves performance on par with state-of-the-art monocular video 3D tracking methods.
arXiv Detail & Related papers (2025-06-11T17:59:58Z)
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations [44.53106180688135]
This work takes on the challenge of reconstructing 3D scenes from sparse or single-view inputs.<n>We introduce SpatialCrafter, a framework that leverages the rich knowledge in video diffusion models to generate plausible additional observations.<n>Through a trainable camera encoder and an epipolar attention mechanism for explicit geometric constraints, we achieve precise camera control and 3D consistency.
arXiv Detail & Related papers (2025-05-17T13:05:13Z)
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos [33.12653115668027]
Our method generates Multiplane Images (MPIs) that ensure geometric consistency.<n>Our approach directly generates the final output through a single denoising process.<n>To effectively learn from monocular videos, we introduce a training mechanism that reconstructs the output MPI randomly in either the target or the reference camera space.
arXiv Detail & Related papers (2025-04-27T08:56:02Z)
Enhancing Monocular 3D Scene Completion with Diffusion Model [20.81599069390756]
3D scene reconstruction is essential for applications in virtual reality, robotics, and autonomous driving.<n>Traditional 3D Gaussian Splatting techniques rely on images captured from multiple viewpoints to achieve optimal performance.<n>We introduce FlashDreamer, a novel approach for reconstructing a complete 3D scene from a single image.
arXiv Detail & Related papers (2025-03-02T04:36:57Z)
Wonderland: Navigating 3D Scenes from a Single Image [43.99037613068823]
We introduce a large-scale reconstruction model that leverages latents from a video diffusion model to predict 3D Gaussian Splattings of scenes in a feed-forward manner.<n>We train the 3D reconstruction model to operate on the video latent space with a progressive learning strategy, enabling the efficient generation of high-quality, wide-scope, and generic 3D scenes.
arXiv Detail & Related papers (2024-12-16T18:58:17Z)
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z)
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model [86.37536249046943]
textbfDMV3D is a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering.
arXiv Detail & Related papers (2023-11-15T18:58:41Z)
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z)
Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution. Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.