Related papers: Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

URL: http://arxiv.org/abs/2311.15980v2
Date: Thu, 21 Mar 2024 14:21:58 GMT
Title: Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Authors: Yuanxun Lu, Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao,
Abstract summary: Current methods for creating 3D content are time-consuming and lose generation diversity. In this work, we employ a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. We show that our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds.
Score: 32.13452288549591
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in generative AI have unveiled significant potential for the creation of 3D content. However, current methods either apply a pre-trained 2D diffusion model with the time-consuming score distillation sampling (SDS), or a direct 3D diffusion model trained on limited 3D data losing generation diversity. In this work, we approach the problem by employing a multi-view 2.5D diffusion fine-tuned from a pre-trained 2D diffusion model. The multi-view 2.5D diffusion directly models the structural distribution of 3D data, while still maintaining the strong generalization ability of the original 2D diffusion model, filling the gap between 2D diffusion-based and direct 3D diffusion-based methods for 3D content generation. During inference, multi-view normal maps are generated using the 2.5D diffusion, and a novel differentiable rasterization scheme is introduced to fuse the almost consistent multi-view normal maps into a consistent 3D model. We further design a normal-conditioned multi-view image generation module for fast appearance generation given the 3D geometry. Our method is a one-pass diffusion process and does not require any SDS optimization as post-processing. We demonstrate through extensive experiments that, our direct 2.5D generation with the specially-designed fusion scheme can achieve diverse, mode-seeking-free, and high-fidelity 3D content generation in only 10 seconds. Project page: https://nju-3dv.github.io/projects/direct25.

Related papers

ScenDi: 3D-to-2D Scene Diffusion Cascades for Urban Generation [53.049890350639366]
We propose ScenDi, a method for urban scene generation that integrates both 3D and 2D diffusion models.<n>By leveraging the coarse 3D scene as guidance for 2D video diffusion, ScenDi generates desired scenes based on input conditions and successfully adheres to accurate camera trajectories.
arXiv Detail & Related papers (2026-01-21T17:53:21Z)
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation [30.418733980714915]
We introduce a novel representation that utilizes dense 2D grids, enabling the fine-tuning of 2D diffusion models to generate 3D Gaussians. Our experimental results show that text-to-image diffusion models can be effectively adapted for 3D content generation.
arXiv Detail & Related papers (2025-03-20T05:59:41Z)
Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation [66.75243908044538]
We introduce Zero-1-to-G, a novel approach to direct 3D generation on Gaussian splats using pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats. This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects.
arXiv Detail & Related papers (2025-01-09T18:37:35Z)
4Diffusion: Multi-view Video Diffusion Model for 4D Generation [55.82208863521353]
Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. We propose a novel 4D generation pipeline, namely 4Diffusion, aimed at generating spatial-temporally consistent 4D content from a monocular video.
arXiv Detail & Related papers (2024-05-31T08:18:39Z)
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation [73.36690511083894]
This paper introduces a novel framework called LN3Diff to address a unified 3D diffusion pipeline. Our approach harnesses a 3D-aware architecture and variational autoencoder to encode the input image into a structured, compact, and 3D latent space. It achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation.
arXiv Detail & Related papers (2024-03-18T17:54:34Z)
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model. Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach. These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z)
3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors [85.11117452560882]
We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping. The second stage utilizes 2D diffusion priors to further refine the texture of coarse 3D models from the first stage. The refinement consists of both latent and pixel space optimization for high-quality texture generation
arXiv Detail & Related papers (2024-03-04T17:26:28Z)
CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models. Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z)
MVDream: Multi-view Diffusion for 3D Generation [14.106283556521962]
We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models and the consistency of 3D renderings.
arXiv Detail & Related papers (2023-08-31T07:49:06Z)
3D-aware Image Generation using 2D Diffusion Models [23.150456832947427]
We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. We utilize 2D diffusion models to boost the generative modeling power of the method. We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods.
arXiv Detail & Related papers (2023-03-31T09:03:18Z)
HoloDiffusion: Training a 3D Diffusion Model using 2D Images [71.1144397510333]
We introduce a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision. We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
arXiv Detail & Related papers (2023-03-29T07:35:56Z)
Improving 3D Imaging with Pre-Trained Perpendicular 2D Diffusion Models [52.529394863331326]
We propose a novel approach using two perpendicular pre-trained 2D diffusion models to solve the 3D inverse problem. Our method is highly effective for 3D medical image reconstruction tasks, including MRI Z-axis super-resolution, compressed sensing MRI, and sparse-view CT.
arXiv Detail & Related papers (2023-03-15T08:28:06Z)
3D Neural Field Generation using Triplane Diffusion [37.46688195622667]
We present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields. We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
arXiv Detail & Related papers (2022-11-30T01:55:52Z)
TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation [19.976938789105393]
TetraDiffusion is a diffusion model that operates on a tetrahedral partitioning of 3D space to enable efficient, high-resolution 3D shape generation. Remarkably, TetraDiffusion enables rapid sampling of detailed 3D objects in nearly real-time with unprecedented resolution.
arXiv Detail & Related papers (2022-11-23T18:58:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.