Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting
- URL: http://arxiv.org/abs/2505.04262v1
- Date: Wed, 07 May 2025 09:12:45 GMT
- Title: Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting
- Authors: Feng Yang, Wenliang Qian, Wangmeng Zuo, Hui Li,
- Abstract summary: We propose a framework that couples multi-view joint distribution priors to ensure geometrically consistent 3D generation.<n>We derive an effective optimization rule that effectively couples multi-view priors to guide optimization across different viewpoints.<n>We employ a deformable tetrahedral grid, from 3D-GS and refined through CSD, to produce high-quality, refined meshes.
- Score: 51.08718483081347
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Score Distillation Sampling (SDS) leverages pretrained 2D diffusion models to advance text-to-3D generation but neglects multi-view correlations, being prone to geometric inconsistencies and multi-face artifacts in the generated 3D content. In this work, we propose Coupled Score Distillation (CSD), a framework that couples multi-view joint distribution priors to ensure geometrically consistent 3D generation while enabling the stable and direct optimization of 3D Gaussian Splatting. Specifically, by reformulating the optimization as a multi-view joint optimization problem, we derive an effective optimization rule that effectively couples multi-view priors to guide optimization across different viewpoints while preserving the diversity of generated 3D assets. Additionally, we propose a framework that directly optimizes 3D Gaussian Splatting (3D-GS) with random initialization to generate geometrically consistent 3D content. We further employ a deformable tetrahedral grid, initialized from 3D-GS and refined through CSD, to produce high-quality, refined meshes. Quantitative and qualitative experimental results demonstrate the efficiency and competitive quality of our approach.
Related papers
- ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting [29.69981069695724]
We tackle the challenge of concurrent reconstruction at the part level with the RGB appearance and estimation of motion parameters.<n>We reconstruct the articulated object in 3D Gaussian representations with both appearance and geometry information at the same time.<n>We introduce ArticulatedGS, a self-supervised, comprehensive framework that autonomously learns to model shapes and appearances at the part level.
arXiv Detail & Related papers (2025-03-11T07:56:12Z) - DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z) - GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.<n>Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.<n>The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians [97.15119679296954]
CompGS is a novel generative framework that employs 3D Gaussian Splatting (GS) for efficient, compositional text-to-3D content generation.
CompGS can be easily extended to controllable 3D editing, facilitating scene generation.
arXiv Detail & Related papers (2024-10-28T04:35:14Z) - GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images.
We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization.
Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z) - Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D
priors [16.93758384693786]
Bidirectional Diffusion(BiDiff) is a unified framework that incorporates both a 3D and a 2D diffusion process.
Our model achieves high-quality, diverse, and scalable 3D generation.
arXiv Detail & Related papers (2023-12-07T10:00:04Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z) - Text-to-3D using Gaussian Splatting [18.163413810199234]
This paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation.
GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting.
Our approach can generate 3D assets with delicate details and accurate geometry.
arXiv Detail & Related papers (2023-09-28T16:44:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.