Related papers: Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

URL: http://arxiv.org/abs/2406.16695v2
Date: Mon, 1 Jul 2024 01:55:12 GMT
Title: Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling
Authors: Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, Jin-Hwa Kim, Seungryong Kim,
Abstract summary: We introduce 3D consistent noising, geometry-based gradient warping and novel gradient consistency loss. We successfully address the geometric inconsistency problems in text-to-3D generation task with minimal cost and being compatible with existing score distillation-based models.
Score: 31.945761751215134
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at https://ku-cvlab.github.io/GSD/.

Related papers

Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching [14.267619174518106]
We introduce Dive3D, a novel text-to-3D generation framework that replaces KL-based objectives with Score Implicit Matching (SIM) loss.<n>We validate Dive3D across various 2D-to-3D prompts and find that it consistently outperforms prior methods in qualitative assessments.<n> Dive3D also achieves strong results on quantitative metrics, including text-asset alignment, 3D plausibility, text-geometry consistency, texture quality, and geometric detail.
arXiv Detail & Related papers (2025-06-16T15:21:30Z)
Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting [51.08718483081347]
We propose a framework that couples multi-view joint distribution priors to ensure geometrically consistent 3D generation.<n>We derive an effective optimization rule that effectively couples multi-view priors to guide optimization across different viewpoints.<n>We employ a deformable tetrahedral grid, from 3D-GS and refined through CSD, to produce high-quality, refined meshes.
arXiv Detail & Related papers (2025-05-07T09:12:45Z)
GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation [24.255633621887988]
We propose a method that leverages 2D diffusion models' implicit 3D reasoning ability while ensuring 3D consistency. Specifically, the proposed Gaussian Splatting Decoder enforces 3D consistency by transforming SV3D latent outputs into an explicit 3D representation. As a result, our approach simultaneously generates high-quality, multi-view-consistent images and accurate 3D models.
arXiv Detail & Related papers (2025-03-08T09:10:31Z)
GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency. Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z)
Consistent Flow Distillation for Text-to-3D Generation [14.150490171643034]
Score Distillation Sampling (SDS) has made significant strides in distilling image-generative models for 3D generation. However, its maximum-likelihood-seeking behavior often leads to degraded visual quality and diversity, limiting its effectiveness in 3D applications. We propose Consistent Flow Distillation (CFD), which addresses these limitations.
arXiv Detail & Related papers (2025-01-09T18:56:05Z)
RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning [27.4552892119823]
inconsistencies in multi-view snapshots frequently introduce noise and artifacts along object boundaries, undermining the 3D reconstruction process. We leverage 3D Gaussian Splatting (3DGS) for 3D reconstruction, and explicitly integrate uncertainty-aware learning into the reconstruction process. We apply adaptive pixel-wise loss weighting to regularize the models, reducing reconstruction intensity in high-uncertainty regions.
arXiv Detail & Related papers (2024-11-28T02:19:28Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias. We employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the fundamental shape and structure. By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z)
VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks. PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps. For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model. Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach. These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z)
Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation. We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z)
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior [87.55592645191122]
Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet. We propose a novel and effective "Consistent3D" method that explores the ODE deterministic sampling prior for text-to-3D generation. Experimental results show the efficacy of our Consistent3D in generating high-fidelity and diverse 3D objects and large-scale scenes.
arXiv Detail & Related papers (2024-01-17T08:32:07Z)
SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D [40.088688751115214]
It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation. We improve consistency by aligning the 2D geometric priors in diffusion models with well-defined 3D shapes during the lifting. Our method represents a new state-of-the-art performance with an 85+% consistency rate by human evaluation.
arXiv Detail & Related papers (2023-10-04T05:59:50Z)
Text-to-3D using Gaussian Splatting [18.163413810199234]
This paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation. GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting. Our approach can generate 3D assets with delicate details and accurate geometry.
arXiv Detail & Related papers (2023-09-28T16:44:31Z)
Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism. We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies. We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections [57.60094385551773]
We propose a trainable framework for learning a deformable 3D geometry model from inhomogeneous image collections. We in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.
arXiv Detail & Related papers (2021-03-31T17:25:36Z)
Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild [96.09941542587865]
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild. In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates. We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.
arXiv Detail & Related papers (2020-07-17T12:34:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.