Related papers: DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

URL: http://arxiv.org/abs/2407.16260v1
Date: Tue, 23 Jul 2024 07:59:57 GMT
Title: DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
Authors: Zizheng Yan, Jiapeng Zhou, Fanpeng Meng, Yushuang Wu, Lingteng Qiu, Zisheng Ye, Shuguang Cui, Guanying Chen, Xiaoguang Han,
Abstract summary: We propose DreamDissector, a text-to-3D method capable of generating multiple independent objects with interactions. DreamDissector accepts a multi-object text-to-3D NeRF as input and produces independent textured meshes.
Score: 44.30208916019448
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-3D generation has recently seen significant progress. To enhance its practicality in real-world applications, it is crucial to generate multiple independent objects with interactions, similar to layer-compositing in 2D image editing. However, existing text-to-3D methods struggle with this task, as they are designed to generate either non-independent objects or independent objects lacking spatially plausible interactions. Addressing this, we propose DreamDissector, a text-to-3D method capable of generating multiple independent objects with interactions. DreamDissector accepts a multi-object text-to-3D NeRF as input and produces independent textured meshes. To achieve this, we introduce the Neural Category Field (NeCF) for disentangling the input NeRF. Additionally, we present the Category Score Distillation Sampling (CSDS), facilitated by a Deep Concept Mining (DCM) module, to tackle the concept gap issue in diffusion models. By leveraging NeCF and CSDS, we can effectively derive sub-NeRFs from the original scene. Further refinement enhances geometry and texture. Our experimental results validate the effectiveness of DreamDissector, providing users with novel means to control 3D synthesis at the object level and potentially opening avenues for various creative applications in the future.

Related papers

DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting [24.719972380079405]
DecompDreamer is a training routine designed to generate high-quality 3D compositions. It decomposes scenes into structured components and their relationships. It effectively generates intricate 3D compositions with superior object disentanglement.
arXiv Detail & Related papers (2025-03-15T03:37:25Z)
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z)
${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields [33.168225243348786]
We present a single model, Multi-Modal Decomposition NeRF ($M2D$NeRF), that is capable of both text-based and visual patch-based edits. Specifically, we use multi-modal feature distillation to integrate teacher features from pretrained visual and language models into 3D semantic feature volumes. Experiments on various real-world scenes show superior performance in 3D scene decomposition tasks compared to prior NeRF-based methods.
arXiv Detail & Related papers (2024-05-08T12:25:21Z)
VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation. A lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z)
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image [94.11473240505534]
We introduce HyperDreamer, a tool for creating 3D content from a single image. It is hyper-realistic enough for post-generation usage, as users cannot view, render and edit the resulting 3D content from a full range. We demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.
arXiv Detail & Related papers (2023-12-07T18:58:09Z)
X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation [61.48050470095969]
X-Dreamer is a novel approach for high-quality text-to-3D content creation. It bridges the gap between text-to-2D and text-to-3D synthesis.
arXiv Detail & Related papers (2023-11-30T07:23:00Z)
Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z)
NAP: Neural 3D Articulation Prior [31.875925637190328]
We propose Neural 3D Articulation Prior (NAP), the first 3D deep generative model to synthesize 3D articulated object models. To generate articulated objects, we first design a novel articulation tree/graph parameterization and then apply a diffusion-denoising probabilistic model over this representation. In order to capture both the geometry and the motion structure whose distribution will affect each other, we design a graph-attention denoising network for learning the reverse diffusion process.
arXiv Detail & Related papers (2023-05-25T17:59:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.