Related papers: Distilling Multi-view Diffusion Models into 3D Generators

Distilling Multi-view Diffusion Models into 3D Generators

URL: http://arxiv.org/abs/2504.00457v3
Date: Thu, 03 Apr 2025 01:44:53 GMT
Title: Distilling Multi-view Diffusion Models into 3D Generators
Authors: Hao Qin, Luyuan Chen, Ming Kong, Mengxu Lu, Qiang Zhu,
Abstract summary: We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting.<n> DD3G compresses and integrates extensive visual and spatial knowledge from the MV-DM.<n>We propose PEPD, a generator consisting of Pattern Extraction and Progressive Decoding phases, which enables efficient fusion of probabilistic flow.
Score: 4.3238419212557115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses and integrates extensive visual and spatial geometric knowledge from the MV-DM by simulating its ordinary differential equation (ODE) trajectory, ensuring the distilled generator generalizes better than those trained solely on 3D data. Unlike previous amortized optimization approaches, we align the MV-DM and 3D generator representation spaces to transfer the teacher's probabilistic flow to the student, thus avoiding inconsistencies in optimization objectives caused by probabilistic sampling. The introduction of probabilistic flow and the coupling of various attributes in 3D Gaussians introduce challenges in the generation process. To tackle this, we propose PEPD, a generator consisting of Pattern Extraction and Progressive Decoding phases, which enables efficient fusion of probabilistic flow and converts a single image into 3D Gaussians within 0.06 seconds. Furthermore, to reduce knowledge loss and overcome sparse-view supervision, we design a joint optimization objective that ensures the quality of generated samples through explicit supervision and implicit verification. Leveraging existing 2D generation models, we compile 120k high-quality RGBA images for distillation. Experiments on synthetic and public datasets demonstrate the effectiveness of our method. Our project is available at: https://qinbaigao.github.io/DD3G_project/

Related papers

3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians [15.776720879897345]
We introduce 3DGEER, an Exact and Efficient Volumetric Gaussian Rendering method.<n>Our method consistently outperforms prior methods, establishing a new state-of-the-art in real-time neural rendering.
arXiv Detail & Related papers (2025-05-29T22:52:51Z)
ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting [2.4241677964735997]
We propose ContrastiveGaussian, which integrates contrastive learning into the generative process. By using a perceptual loss, we effectively differentiate between positive and negative samples, leveraging the visual inconsistencies to improve 3D generation quality.
arXiv Detail & Related papers (2025-04-10T19:56:09Z)
Text-to-3D Generation using Jensen-Shannon Score Distillation [14.079043195485601]
We derive a bounded score distillation objective based on Jensen-Shannon divergence (JSD) We provide a practical implementation of JSD by utilizing the theory of generative adversarial networks. Experimental results on T3Bench demonstrate that our method can produce high-quality and diversified 3D assets.
arXiv Detail & Related papers (2025-03-08T13:27:18Z)
F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Aggregative Gaussian Splatting [35.625593119642424]
This paper tackles the problem of generalizable 3D-aware generation from monocular datasets. We propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting. We also introduce a self-supervised cycle-aggregative constraint to enforce cross-view consistency in the learned 3D representation.
arXiv Detail & Related papers (2025-01-12T04:44:44Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.<n>Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
L3DG: Latent 3D Gaussian Diffusion [74.36431175937285]
L3DG is the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. We employ a sparse convolutional architecture to efficiently operate on room-scale scenes. By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time.
arXiv Detail & Related papers (2024-10-17T13:19:32Z)
Atlas Gaussians Diffusion for 3D Generation [37.68480030996363]
latent diffusion model has proven effective in developing novel 3D generation techniques. Key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. We introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation.
arXiv Detail & Related papers (2024-08-23T13:27:27Z)
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting [9.383423119196408]
We introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing multi-view diffusion models.<n>MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation.<n>In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations.
arXiv Detail & Related papers (2024-03-15T02:57:20Z)
AGG: Amortized Generative 3D Gaussians for Single Image to 3D [108.38567665695027]
We introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image. AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. We propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module.
arXiv Detail & Related papers (2024-01-08T18:56:33Z)
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation [55.661467968178066]
We propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.
arXiv Detail & Related papers (2023-09-28T17:55:05Z)
GVP: Generative Volumetric Primitives [76.95231302205235]
We present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.
arXiv Detail & Related papers (2023-03-31T16:50:23Z)
NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images. We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.