Related papers: GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

URL: http://arxiv.org/abs/2403.12013v1
Date: Mon, 18 Mar 2024 17:50:41 GMT
Title: GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Authors: Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long,
Abstract summary: We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images. We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
Score: 94.56927147492738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenarios or suffer from the inability to capture geometric details. In this paper, we demonstrate that generative models, as opposed to traditional discriminative models (e.g., CNNs and Transformers), can effectively address the inherently ill-posed problem. We further show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. Specifically, we extend the original stable diffusion model to jointly predict depth and normal, allowing mutual information exchange and high consistency between the two representations. More importantly, we propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions. This strategy enables our model to recognize different scene layouts, capturing 3D geometry with remarkable fidelity. GeoWizard sets new benchmarks for zero-shot depth and normal prediction, significantly enhancing many downstream applications such as 3D reconstruction, 2D content creation, and novel viewpoint synthesis.

Related papers

Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video [76.32954467706581]
We propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams.<n>We use a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision.<n>Experiments show that SAGE significantly enhances zero-shot generalization, reducing Chamfer Distance by 20-42% on unseen benchmarks.
arXiv Detail & Related papers (2026-02-08T09:53:21Z)
Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image [68.55613894952177]
We introduce textbfWonder3D++, a novel method for efficiently generating high-fidelity textured meshes from single-view images.<n>We propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images.<n> Lastly, we introduce a cascaded 3D mesh extraction algorithm that drives high-quality surfaces from the multi-view 2D representations in only about $3$ minute in a coarse-to-fine manner.
arXiv Detail & Related papers (2025-11-03T17:24:18Z)
A 3DGS-Diffusion Self-Supervised Framework for Normal Estimation from a Single Image [5.588610465556571]
The lack of spatial dimensional information remains a challenge in normal estimation from a single image.<n>Recent diffusion-based methods have demonstrated significant potential in 2D-to-3D implicit mapping.<n>This paper proposes SINGAD, a novel Self-supervised framework from a single Image for Normal estimation.
arXiv Detail & Related papers (2025-08-08T02:32:33Z)
Sparse-View 3D Reconstruction: Recent Advances and Open Challenges [0.8583178253811411]
Sparse-view 3D reconstruction is essential for applications in which dense image acquisition is impractical.<n>This survey reviews the latest advances in neural implicit models and explicit point-cloud-based approaches.<n>We analyze how geometric regularization, explicit shape modeling, and generative inference are used to mitigate artifacts.
arXiv Detail & Related papers (2025-07-22T09:57:28Z)
Dens3R: A Foundation Model for 3D Geometry Prediction [44.13431776180547]
Dens3R is a 3D foundation model designed for joint geometric dense prediction.<n>By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities.
arXiv Detail & Related papers (2025-07-22T07:22:30Z)
Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition [4.196626042312499]
We propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning.<n>Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density.<n>With the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view and 3D segment perspectives.
arXiv Detail & Related papers (2025-06-17T07:04:07Z)
GCA-3D: Towards Generalized and Consistent Domain Adaptation of 3D Generators [24.67369444661137]
GCA-3D is a generalized and consistent 3D domain adaptation method without the intricate pipeline of data generation. We introduce multi-modal depth-aware score distillation sampling loss to efficiently adapt 3D generative models in a non-adversarial manner. Experiments demonstrate that GCA-3D outperforms previous methods in terms of efficiency, generalization, pose accuracy, and identity consistency.
arXiv Detail & Related papers (2024-12-20T02:13:11Z)
Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions. Our approach uses diffusion models with a novel network architecture to learn surface point distributions. We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images. Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z)
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion [27.35300492569507]
We present GRIN, an efficient diffusion model designed to ingest sparse unstructured training data. We show that GRIN establishes a new state of the art in zero-shot metric monocular depth estimation even when trained from scratch.
arXiv Detail & Related papers (2024-09-15T23:32:04Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions [22.077366472693395]
We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained. We propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner.
arXiv Detail & Related papers (2024-06-06T17:00:10Z)
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z)
Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.