Related papers: Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image

Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image

URL: http://arxiv.org/abs/2506.21152v1
Date: Thu, 26 Jun 2025 11:22:06 GMT
Title: Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image
Authors: Pufan Li, Bi'an Du, Wei Hu,
Abstract summary: Existing approaches often rely on fine-tuning pretrained 2D diffusion models or directly generating 3D information through fast network inference.<n>We present a novel method that seamlessly integrates geometry and perception priors without requiring additional model training.<n>Experiments demonstrate the higher-fidelity reconstruction results of our method, outperforming existing methods on novel view synthesis and 3D reconstruction.
Score: 10.36303976374455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating realistic 3D objects from single-view images requires natural appearance, 3D consistency, and the ability to capture multiple plausible interpretations of unseen regions. Existing approaches often rely on fine-tuning pretrained 2D diffusion models or directly generating 3D information through fast network inference or 3D Gaussian Splatting, but their results generally suffer from poor multiview consistency and lack geometric detail. To takle these issues, we present a novel method that seamlessly integrates geometry and perception priors without requiring additional model training to reconstruct detailed 3D objects from a single image. Specifically, we train three different Gaussian branches initialized from the geometry prior, perception prior and Gaussian noise, respectively. The geometry prior captures the rough 3D shapes, while the perception prior utilizes the 2D pretrained diffusion model to enhance multiview information. Subsequently, we refine 3D Gaussian branches through mutual interaction between geometry and perception priors, further enhanced by a reprojection-based strategy that enforces depth consistency. Experiments demonstrate the higher-fidelity reconstruction results of our method, outperforming existing methods on novel view synthesis and 3D reconstruction, demonstrating robust and consistent 3D object generation.

Related papers

Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image [68.55613894952177]
We introduce textbfWonder3D++, a novel method for efficiently generating high-fidelity textured meshes from single-view images.<n>We propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images.<n> Lastly, we introduce a cascaded 3D mesh extraction algorithm that drives high-quality surfaces from the multi-view 2D representations in only about $3$ minute in a coarse-to-fine manner.
arXiv Detail & Related papers (2025-11-03T17:24:18Z)
Shape from Semantics: 3D Shape Generation from Multi-View Semantics [30.969299308083723]
Existing 3D reconstruction methods utilize guidances such as 2D images, 3D point clouds, shape contours and single semantics to recover the 3D surface.<n>We propose a novel 3D modeling task called Shape from Semantics'', which aims to create 3D models whose geometry and appearance are consistent with the given text semantics when viewed from different views.
arXiv Detail & Related papers (2025-02-01T07:51:59Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild. Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture. We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z)
Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias.<n>By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z)
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view. The model learns to generate 3D objects represented by sets of GS ellipsoids. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images [45.66479596827045]
We propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach. To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach.
arXiv Detail & Related papers (2024-04-11T04:58:18Z)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images. We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization. Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model. Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach. These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z)
Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z)
Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z)
3D Surface Reconstruction in the Wild by Deforming Shape Priors from Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem. We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image. Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z)
Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape [77.95154911528365]
3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori. Previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry. This paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person.
arXiv Detail & Related papers (2022-04-09T03:46:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.